|
jonnyawsom3
|
2025-05-09 02:07:40
|
https://discord.com/channels/794206087879852103/824000991891554375/1370391774434168913 I don't *think* that's how it works, but that is an interesting way to do it. Recursively iterate on the MA tree until all residuals are below a certain value, instead of only testing a tree on a percentage of all pixels
|
|
|
_wb_
|
2025-05-09 02:28:48
|
The `-I` percentage is the percentage of samples used to do learning. The samples use quantized property values to save memory (property values are int32_t but they get quantized to at most 256 buckets so they fit into uint8), and to decide how to quantize the property values, first 10% of the `-I` percentage is sampled just to figure out how to quantize, before the actual sample is taken.
|
|
|
Olav
|
2025-05-12 05:21:09
|
Will jxl-rs also do JPEG decoding, like libjxl djxl?
|
|
|
jonnyawsom3
|
2025-05-12 05:34:08
|
Oh, you mean JPEG reconstruction?
|
|
|
AccessViolation_
|
2025-05-12 05:51:32
|
I assume they mean decoding from any type of JXL to JPEG, like `djxl`
|
|
|
_wb_
|
2025-05-12 06:34:44
|
If it's not JPEG reconstruction, then it's just decoding to pixels and encoding to JPEG. Not really in the scope of a jxl decoder library...
|
|
|
Olav
|
2025-05-12 06:44:59
|
I ment JPEG to pixels. Just thought I'd be a good argument for jxl-rs adoption for browsers if they could replace their current JPEG decoder with it.
|
|
|
TheBigBadBoy - πΈπ
|
2025-05-12 09:08:57
|
well jxl-rs will be adopted by FireFox even without having "JPEG decoder" <:FeelsAmazingMan:808826295768449054>
|
|
|
CrushedAsian255
|
|
Olav
I ment JPEG to pixels. Just thought I'd be a good argument for jxl-rs adoption for browsers if they could replace their current JPEG decoder with it.
|
|
2025-05-12 11:13:21
|
as in it can read both JXL and JPEG files?
|
|
2025-05-12 11:13:25
|
not jpeg reconstruction?
|
|
|
Olav
|
|
CrushedAsian255
not jpeg reconstruction?
|
|
2025-05-13 01:17:37
|
Yes, as in read both JPEG and JPEG XL.
|
|
|
jonnyawsom3
|
2025-05-13 09:44:24
|
<@794205442175402004> you forgot to add this to the merge queue after you approved it. No rush though
https://github.com/libjxl/libjxl/pull/4244
|
|
|
monad
|
2025-05-13 10:32:43
|
surely it was intentional
|
|
|
_wb_
|
2025-05-13 10:50:23
|
Right. On the queue it goes.
|
|
|
jonnyawsom3
|
2025-05-13 10:50:31
|
Thanks
|
|
2025-05-15 05:55:54
|
Well, that doesn't look good. The release builds are all failing https://github.com/libjxl/libjxl/actions/workflows/release.yaml
|
|
2025-05-15 06:00:32
|
Think it was caused by this https://github.com/libjxl/libjxl/pull/4220
|
|
|
_wb_
|
2025-05-15 06:52:44
|
Yeah I think I am referencing something that doesn't exist or something like that, see also https://github.com/libjxl/libjxl/pull/4220#issuecomment-2854376395
|
|
2025-05-15 06:53:39
|
<@811568887577444363> if you get a chance please fix my mess-up π
|
|
|
jonnyawsom3
|
2025-05-16 03:02:51
|
<@795684063032901642> and if you get a chance, could you take another glance at this? Been ready for quite a while, but we can only iterate on it once people start using it
https://github.com/google/jpegli/pull/130
|
|
2025-05-16 03:03:23
|
Then the repos also need to by synced, there were some missing commits on the google one that aren't in libjxl
|
|
|
Demiurge
|
2025-05-18 09:14:51
|
It would definitely be easier if it was broken up into separate PRs like one for the app14 fix, one for the chroma subsampling changes and fixes, and one for the other tweaks.
|
|
2025-05-18 09:16:32
|
Speaking as someone who isn't really involved in the project, but knows it's a bad idea to have everything lumped together as an all-in-one like that...
|
|
|
Mine18
|
2025-05-20 06:35:16
|
<@238552565619359744> https://github.com/libjxl/libjxl/pull/4258
is it only preset 7 that's slower? or preset 7 and higher are slower?
|
|
|
jonnyawsom3
|
|
Mine18
<@238552565619359744> https://github.com/libjxl/libjxl/pull/4258
is it only preset 7 that's slower? or preset 7 and higher are slower?
|
|
2025-05-20 01:53:15
|
Now that you mention it... Yeah, for some reason it's only effort 7 that's massively slower. All others are around 5% or 25% for effort 8 and 9 instead of 70% for effort 7 on the new image I just tried. Maybe something to do with patches and chunked encoding?
|
|
|
Mine18
|
2025-05-20 01:58:23
|
what about the fastest presets, <=4, these would be very impactful for any casual user or realtime usecases for JXL
|
|
|
jonnyawsom3
|
2025-05-20 02:00:53
|
Yeah, it's patches triggering because progressive_dc disables buffering
|
|
2025-05-20 02:01:10
|
And patches are only enabled at effort 7 or higher, causing the sudden speed hit
|
|
2025-05-20 02:01:49
|
Effort 8 and 9 already disable buffering when distance is above 0.5, so they remain a lot closer to before
|
|
|
Mine18
what about the fastest presets, <=4, these would be very impactful for any casual user or realtime usecases for JXL
|
|
2025-05-20 02:04:41
|
Effort 1 and 2 are actually around 8% faster, 3 remains the same, 4 is 10% slower, 5 and 6 are 5% slower
|
|
2025-05-20 02:06:50
|
Also bear in mind this is only when progressive encoding is used, so probably not for the casual user or realtime
|
|
|
Mine18
|
2025-05-20 02:08:47
|
is progressive encoding like chunked encoding? what does it have to do with progressive decoding?
|
|
|
jonnyawsom3
|
|
Mine18
is progressive encoding like chunked encoding? what does it have to do with progressive decoding?
|
|
2025-05-20 02:34:47
|
JPEG XL is inherently progressive by loading per-group, or with the 1:8 DC for lossy, but you can split the AC in half to get a 'half-quality' result at around 50% loaded. You can also turn the DC into a 1:64 image instead, with it's own 1:8 DC, which allows decoding only a few pixels and every doubling of resolution in-between. Old behaviour only did progressive AC, so at around 50% you'd have a full image without black gaps. The PR adds progressive DC, so you have a full image much sooner, at a lower resolution.
For example, the first progressive pass of a 4K image:
```
Regular Lossy: 3840 x 2160 --> 480 x 270
Progressive DC: 3840 x 2160 --> 60 x 33.75 --> 7.5 x 4.2
```
|
|
|
Yeah, it's patches triggering because progressive_dc disables buffering
|
|
2025-05-20 02:41:24
|
Hmm.. Maybe not. Disabling patches brings it down from 72% slower to 54%, but that's still double what effort 8 and 9 lost
|
|
2025-05-20 02:50:34
|
I've been testing on the wrong version... Lovely, running tests again
|
|
|
Mine18
|
|
Also bear in mind this is only when progressive encoding is used, so probably not for the casual user or realtime
|
|
2025-05-20 03:03:16
|
so does this mean that when only using the new progressive decoding, encoding performance isnt as bad?
|
|
|
jonnyawsom3
|
2025-05-20 03:04:55
|
Around 25% slower for all effort levels using a fresh build, but you can still override the DC with `--progressive_dc 0` to avoid the slowdown and return to old behaviour (but centre-out)
|
|
|
Mine18
|
2025-05-20 03:07:42
|
good to know, i think these changes are overall worth it as center out prog decoding is a huge strength for web images and this should make jxl more appealing to everyone
|
|
|
jonnyawsom3
|
2025-05-20 03:21:15
|
```Current -p
Compressed to 518.4 kB (0.500 bpp).
3840 x 2160, 2.285 MP/s [2.28, 2.28], , 1 reps, 16 threads.
Current PR -p
Compressed to 529.6 kB (0.511 bpp).
3840 x 2160, 1.661 MP/s [1.66, 1.66], , 1 reps, 16 threads.
PR -p --patches 0
Compressed to 529.6 kB (0.511 bpp).
3840 x 2160, 3.184 MP/s [3.18, 3.18], , 1 reps, 16 threads.```
Hmm, so disabling patches makes it even faster than current, but could have some hefty density impacts for certain images
|
|
|
Mine18
|
2025-05-20 03:44:09
|
density? what do you mean?
|
|
|
jonnyawsom3
|
2025-05-20 03:45:54
|
Larger filesize if an image has a lot of patch candidates and I disable them for progressive
|
|
|
Mine18
|
2025-05-20 03:46:43
|
oh, for certain images i was too focused on the results you showed
|
|
|
jonnyawsom3
|
|
_wb_
Yeah I think I am referencing something that doesn't exist or something like that, see also https://github.com/libjxl/libjxl/pull/4220#issuecomment-2854376395
|
|
2025-05-20 04:20:39
|
Also means the nightly builds are 2 weeks old https://artifacts.lucaversari.it/libjxl/libjxl/latest/
|
|
2025-05-30 05:29:09
|
Hit 3K commits the other day
|
|
|
CrushedAsian255
|
|
Hit 3K commits the other day
|
|
2025-05-30 09:25:12
|
This community is very commit-ted
|
|
|
Kleis Auke
|
2025-06-03 12:40:57
|
Just wondering, is `JPEGXL_SO_MINOR_VERSION` only incremented when the existing ABI/API changes? If not, could this be reconsidered? libvips runs into issues with this on "stable" distros, where the convention is to _not_ update libraries when the ABI/API has changed. See PR <https://github.com/libvips/libvips/pull/4550> for context.
|
|
|
_wb_
|
2025-06-03 01:32:48
|
we follow https://semver.org/ as far as I know
|
|
2025-06-03 01:40:18
|
I guess there have always been small API changes at every new released version in the past; I think if we do a 0.12 now then probably we can keep JPEGXL_SO_MINOR_VERSION at 11 since I don't think there has been any ABI/API change.
|
|
|
Kleis Auke
|
|
_wb_
I guess there have always been small API changes at every new released version in the past; I think if we do a 0.12 now then probably we can keep JPEGXL_SO_MINOR_VERSION at 11 since I don't think there has been any ABI/API change.
|
|
2025-06-03 02:12:45
|
That would be great!
|
|
2025-06-03 02:13:06
|
FWIW, the SONAME bump wasn't really necessary between v0.9.0 and v0.11.1 too, according to `abi-compliance-checker`.
<https://kleisauke.nl/compat_reports/libjxl/0.9.0_to_0.11.1/compat_report.html>
<https://kleisauke.nl/compat_reports/libjxl_cms/0.9.0_to_0.11.1/compat_report.html>
<https://kleisauke.nl/compat_reports/libjxl_threads/0.9.0_to_0.11.1/compat_report.html>
|
|
2025-06-03 02:13:24
|
(IIUC, you can safely ignore that "parameter _XX_ became passed in _YY_ register instead of _ZZ_"-errors in those reports)
|
|
|
Kleis Auke
FWIW, the SONAME bump wasn't really necessary between v0.9.0 and v0.11.1 too, according to `abi-compliance-checker`.
<https://kleisauke.nl/compat_reports/libjxl/0.9.0_to_0.11.1/compat_report.html>
<https://kleisauke.nl/compat_reports/libjxl_cms/0.9.0_to_0.11.1/compat_report.html>
<https://kleisauke.nl/compat_reports/libjxl_threads/0.9.0_to_0.11.1/compat_report.html>
|
|
2025-06-03 03:02:38
|
Ah, never mind, I forgot to check `libjxl_extras_codec.so`, which had some changes between v0.9.x and v0.10.x ([according to Debian](<https://salsa.debian.org/debian-phototools-team/libjxl/-/commit/6cdb58efec2ecfea9e9b4f5e38fe5d1b9a4760c3>)) and was converted to a static library in v0.11.0 (see e.g. [this commit](<https://salsa.debian.org/debian-phototools-team/libjxl/-/commit/a160a95b5f408b041ce7912f7d1d4e1fc7786328>)).
|
|
|
jonnyawsom3
|
2025-06-03 03:20:26
|
Finally got round to splitting this into separate PRs. [APP14 Marker](<https://github.com/google/jpegli/pull/135>), [444 Defaults](<https://github.com/google/jpegli/pull/136>) and [Quality based settings](<https://github.com/google/jpegli/pull/137>)
<https://github.com/google/jpegli/pull/130#issuecomment-2935880782>
|
|
2025-06-03 03:23:46
|
The first two should be ready to merge, the quality based settings may require some more work due to test failures
|
|
|
Demiurge
|
2025-06-03 11:58:06
|
Yes, horay!
|
|
2025-06-03 11:59:24
|
Now someone needs to tweak the cjxl tool to give a nonzero exit status when encountering unrecognized PNG chunks.
|
|
2025-06-04 12:00:32
|
Since that seems to be the latest new controversy these days with cjxl not giving a warning when ignoring/throwing out metadata
|
|
|
jonnyawsom3
|
2025-06-07 01:43:53
|
I thought I'd ask, could I get similar permissions for the jpegli repo as libjxl?
Then we can run the tests without requiring a core dev to manually approve it every commit. It would help a lot with finding bugs in PRs and being able to fix them.
|
|
|
|
veluca
|
2025-06-07 01:48:38
|
fine with me but I don't have those permissions myself xD
|
|
|
|
afed
|
2025-06-07 02:24:20
|
yeah, jpegli needs some active maintainers
there are a couple of useful PRs, including static compilation fixes and using it as a lib for Apple and Windows
|
|
2025-06-07 02:31:54
|
mabs also removed jpegli <:FeelsSadMan:808221433243107338>
yeah, it's just about libjxl, but jpegli wasn't re-added as a separate distro either
https://github.com/m-ab-s/media-autobuild_suite/pull/2917
|
|
|
Melirius
|
2025-06-10 11:22:09
|
There is an annoying problem in skcms, not sure what will be the right way to go https://github.com/libjxl/libjxl/issues/4280
|
|
2025-06-10 11:25:57
|
SKCMS maintainers remove old commits, so almost none of old libjxl versions can be built "from scratch" as they point to non-existent commits in this repo
|
|
|
spider-mario
|
2025-06-10 11:57:21
|
as far as I can tell, they donβt?
|
|
2025-06-10 11:57:37
|
Iβm able to checkout libjxl v0.1 and the skcms commit it references from a fresh clone
|
|
|
CrushedAsian255
|
2025-06-10 01:08:26
|
can't you just pull from github or is that not include old history ?
|
|
2025-06-10 01:08:33
|
oh wait skcms
|
|
2025-06-10 01:09:06
|
oh wait git submodules
|
|
|
Melirius
|
2025-06-10 02:14:50
|
Presumably this was the source of the problem, I cannot reproduce it on my freshly pulled repo, sorry to bother
|
|
|
jonnyawsom3
|
2025-06-13 02:49:00
|
Seems like there were a few bugs with effort 1 not being lossless https://github.com/libjxl/libjxl/issues/4287
|
|
2025-06-13 05:08:47
|
And this one is only on main https://github.com/libjxl/libjxl/issues/4026
|
|
|
Demiurge
|
2025-06-14 03:50:06
|
Stuff like that should probably be backported
|
|
|
CrushedAsian255
|
|
Demiurge
Stuff like that should probably be backported
|
|
2025-06-14 07:30:22
|
are there any real reasons to support older versions? If an app is able to update from lets say 0.9 to 0.9.1 or something they should be able to update to 0.12.0 or the latest version
|
|
|
Demiurge
|
2025-06-14 07:31:50
|
No, not really. The only "real" reason is retards that will upgrade minor patches only, like Debian.
|
|
|
CrushedAsian255
|
2025-06-14 08:16:04
|
technically aren't all libjxl versions technically minor?
|
|
2025-06-14 08:16:08
|
as they're 0.x.y
|
|
|
A homosapien
|
2025-06-14 10:18:55
|
libjxl follows semantic versioning https://semver.org/
|
|
|
_wb_
|
|
Seems like there were a few bugs with effort 1 not being lossless https://github.com/libjxl/libjxl/issues/4287
|
|
2025-06-16 01:18:23
|
that one was pretty subtle, this was the bugfix: https://github.com/libjxl/libjxl/pull/4291/files
|
|
|
nol
|
2025-06-16 02:03:48
|
Are there any plans to make a v0.12.0 release soon-ish? If not, is there a particular version/commit that is considered stable? I see that for some commits, the CI doesn't pass but there are also different number of checks run on the commits
|
|
|
_wb_
|
2025-06-16 02:13:00
|
I hope we will have a 0.12 soonish
|
|
|
HCrikki
|
2025-06-16 02:19:24
|
please consider shipping a precompiled **.DLL** (non-EXE) version of jpegli somewhere that can be dropped in place of mozjpeg (handy for both devs bundling unmodified dlls in their apps and users in need of swapping in jpegli for themselves).
|
|
|
|
afed
|
2025-06-16 02:28:04
|
at first, some PRs need to be merged
like
<https://github.com/google/jpegli/pull/116>
<https://github.com/google/jpegli/pull/112>
|
|
|
jonnyawsom3
|
|
_wb_
I hope we will have a 0.12 soonish
|
|
2025-06-16 02:36:25
|
Ideally, we can get some jpegli PRs merged and then pushed to libjxl too. They fix a lot of odd behaviour and blockers to adoption (Apple XYB support, Empty DHT marker crashes, XYB JXL transcoding failures)
|
|
2025-06-16 02:40:20
|
Oh, also the changelog should really be updated as part of PRs, instead of a single update before a release. Makes it much easier to manage and know what's important to note
|
|
|
_wb_
|
2025-06-18 03:02:49
|
lots of open issues at the libjxl repo; if someone can point me to the issues that are actual bugs that need fixing, that would be useful
|
|
2025-06-18 03:03:06
|
(we need to figure out a better way to triage issues and get them closed)
|
|
|
jonnyawsom3
|
2025-06-18 03:57:13
|
I closed between a few dozen and a hundred a few months ago, but many are pending developer input or I can't test myself to see if they can be closed
|
|
2025-06-18 04:14:02
|
Lots of them are "This image compresses worse than PNG/WebP", which I could probably close as duplicate, but that would take some time
|
|
2025-06-18 04:14:56
|
Though, it might be useful to keep them open for future benchmarking/testing
|
|
|
_wb_
|
2025-06-18 04:46:01
|
You can reference the ones you close from the one that is left open to keep the test cases reachable
|
|
|
jonnyawsom3
|
2025-06-18 04:47:14
|
Actually, I forgot Github should automatically link them when you close as a duplicate, so it'll just take time to go through all the issues again
|
|
2025-06-18 11:56:38
|
So, it would seem jpegli *isn't* a drop-in replacement, due to missing 12bit JPEG `undefined symbol: jpeg12_write_raw_data` <https://github.com/RawTherapee/RawTherapee/issues/7125#issuecomment-2746670684>
> To create a JPEG file with 9 to 12 bits per sample, use `jpeg12_write_scanlines()` or `jpeg12_write_raw_data()` instead of `jpeg_write_scanlines()` or `jpeg_write_raw_data()`.
<https://github.com/libjpeg-turbo/libjpeg-turbo/blob/81feffa632bcd928d4cd1c35e5bb6c1eb02ac199/doc/libjpeg.txt#L1092>
|
|
2025-06-18 11:57:59
|
Also... Maybe we should add a `jpegli` channel underneath this one, to help keep things separate
|
|
|
Demiurge
|
2025-06-19 11:12:23
|
I mentioned this before
|
|
2025-06-19 11:12:31
|
The high bit depth api is missing
|
|
2025-06-19 11:15:26
|
Please, please merge libjxl and jpegli back into one repo, and just improve the build system to make it trivially simple for packagers to build/install jpegli and libjxl as separate build/install targets.
|
|
2025-06-19 11:17:45
|
This will allow the most efficient consolidation and re-use of code and build objects, without the massive confusion of maintaining duplicate shared code in a separate repo...
|
|
2025-06-19 11:18:47
|
Plus, it will attract more awareness and attention to the existence of jxl, from people curious about jpegli
|
|
|
jonnyawsom3
|
2025-06-23 06:49:06
|
<https://github.com/libjxl/libjxl/pull/4298>
> Wait, how are there valid bitstreams where `!use_prefix_code_ && num_to_copy_ != 0` ?
>
> I would expect an LZ77 run to terminate at the end of an entropy-coded stream, and never continue into the next entropy-coded stream, especially not if that one has LZ77 disabled.
>
> Maybe the spec needs to be clarified on this point. I'd prefer this to be invalid, and have the invariant that num_to_copy == 0 at the end of an entropy-coded stream. At the very least, this should be true at the end of every bitstream section because otherwise parallel decoding is impossible.
> aren't use_prefix_code_ and lz77.enabled independent?
> Oh right, my bad. I must have seen too much Deflate, causing me to conflate lz77 and prefix coding :)
<@794205442175402004> Does your last comment invalidate your previous concern about multithreaded decoding? I don't know enough about LZ77 xP
|
|
|
_wb_
|
2025-06-23 06:50:50
|
No, but I'm assuming bitstream sections are always ending with num_to_copy == 0. At least the libjxl encoder should never produce runs that cross sections.
|
|
|
jonnyawsom3
|
2025-06-23 06:56:29
|
So it *is* valid, but generally shouldn't be done. Then when it *is* done, now libjxl will handle it gracefully on less threads instead of failing, with other files having no change in behaviour?
|
|
|
_wb_
|
2025-06-23 07:01:47
|
no, I think it is not valid to have lz77 runs that cross sections, but maybe the spec has to be more clear about it
|
|
2025-06-23 07:02:46
|
what that pull request fixes is lz77 runs that cross entropy-coded streams within the same section
|
|
|
jonnyawsom3
|
2025-06-23 07:24:25
|
Ahh right, I see now. I didn't realise you said section instead of stream
|
|
2025-06-23 07:25:44
|
Skimming the spec, I can't see anything explicitly mentioning sections, only this at the start of C.1
> The codestream contains multiple independently entropy-coded streams.
|
|
|
Tirr
|
2025-06-23 07:38:28
|
jxl-oxide discards leftover lz77 runs
|
|
|
_wb_
|
2025-06-23 07:40:53
|
I think it should be considered invalid input if there are leftovers, so it can discard but it can also refuse to decode.
|
|
2025-06-23 07:41:18
|
Unless we have a good reason to allow it.
|
|
|
jonnyawsom3
|
2025-06-23 07:42:48
|
My only thought would be for large images with lots of repeating elements, but then you'd hit distance limits anyway...
|
|
|
_wb_
|
2025-06-23 07:57:59
|
In any case, sections should not start with an ongoing lz77 run, that would break parallel decode and ROI decode. Only question is whether unfinished lz77 runs at the end of a section should be allowed (and ignored) or not. <@179701849576833024> thoughts?
|
|
|
|
veluca
|
2025-06-23 08:01:33
|
Totally agree on the first, I am not sure I have strong opinions on the second tbh
|
|
|
jonnyawsom3
|
2025-06-23 08:22:12
|
C.3.3 says
> NOTE It is not necessarily the case that num_to_copy is zero at the end of the stream.
|
|
|
_wb_
|
2025-06-23 08:39:31
|
In case there are multiple entropy streams in a section, libjxl does create runs that cross the streams so the spec is right that it allows that.
At the end of a section though, num_to_copy will always be zero in libjxl encodes, right <@179701849576833024> ? Or does fast lossless or something else produce runs that are too long?
|
|
|
|
veluca
|
2025-06-23 08:40:14
|
I don't _think_ so
|
|
|
_wb_
|
2025-06-23 08:41:56
|
If it is always zero at the end of a section, then I would propose to make that a requirement so it gives another way to detect corrupted streams. Decoders can still ignore it and decode anyway if they like (you can do whatever you want with invalid input), but they would also be allowed to return an error.
|
|
|
|
veluca
|
2025-06-23 08:46:37
|
tbh I don't see the harm in allowing it to be non0
|
|
2025-06-23 08:46:43
|
might make life easier for encoders
|
|
|
Tirr
|
2025-06-23 08:53:01
|
jxl-oxide creates fresh lz77 state per stream, a bit surprising that it didn't have any problem
|
|
|
|
veluca
|
2025-06-23 08:59:35
|
stream or section?
|
|
2025-06-23 08:59:46
|
or modular-sub-bitstream, actually
|
|
2025-06-23 09:00:12
|
(I would be surprised if we made vardct and modular share a ANS decoder)
|
|
|
Tirr
|
2025-06-23 09:07:08
|
I mean entropy-coded stream
|
|
2025-06-23 09:08:24
|
so leftover lz77 runs are just discarded and don't affect subsequent entropy-coded streams
|
|
2025-06-23 09:09:32
|
maybe I'm just confused and this is intended way to handle those situations, not sure
|
|
|
jonnyawsom3
|
|
Tirr
jxl-oxide creates fresh lz77 state per stream, a bit surprising that it didn't have any problem
|
|
2025-06-23 09:18:37
|
From the sounds of it, it's only become a problem recently, likely due to my PR re-enabling LZ77 for lossless by default. Would be handy if we could find an example file though
|
|
|
_wb_
|
2025-06-23 09:56:49
|
Well https://github.com/libjxl/libjxl/pull/4298 did fix failing roundtrips so that should give us an example file where libjxl encodes something that jxl-oxide (or libjxl 0.11) does not decode.
|
|
2025-06-23 10:01:50
|
btw `// TODO(veluca): No optimization for Huffman mode yet.` β isn't this trivial for Huffman mode?
|
|
|
|
veluca
|
2025-06-23 10:03:45
|
ah very likely
|
|
|
_wb_
|
2025-06-23 10:06:56
|
ah wait that function is called at the beginning of every modular channel, so probably the lz77 run does not actually cross entropy coded streams, but it does cross channels
|
|
2025-06-23 10:07:30
|
like when you have a grayscale image represented in RGBA, which after YCoCg becomes a Y channel followed by 3 channels that are all-zeroes
|
|
2025-06-23 10:08:22
|
then those all-zeroes could end up getting encoded as one large lz77 run
|
|
2025-06-23 10:09:40
|
(instead of the jxl-art way of encoding them as a singleton-histogram, which is what that fast path was made for)
|
|
2025-06-23 10:10:50
|
This is the only place where that function is being used: https://github.com/libjxl/libjxl/blob/main/lib/jxl/modular/encoding/encoding.cc#L203
|
|
|
|
veluca
|
|
_wb_
ah wait that function is called at the beginning of every modular channel, so probably the lz77 run does not actually cross entropy coded streams, but it does cross channels
|
|
2025-06-23 10:12:01
|
yes, I would believe that
|
|
|
_wb_
|
2025-07-04 12:20:21
|
https://linuxsecurity.com/advisories/debian/debian-dsa-5958-1-jpeg-xl-bjcnnxypjiqt
|
|
2025-07-04 12:20:55
|
> For the stable distribution (bookworm), these problems have been fixed in
version 0.7.0-10+deb12u1.
> We recommend that you upgrade your jpeg-xl packages.
|
|
|
jonnyawsom3
|
2025-07-04 03:08:15
|
Huh, apparently builds are getting double zipped in the Github actions
|
|
|
Tirr
|
2025-07-05 09:58:40
|
I'm writing Rust-based libjxl frontend, and it seems that using Rayon thread pool is faster than default libjxl thread pool? though it's just one-shot encoding
```console
$ time cjxl -d 0 -e 3 input.png output.jxl
JPEG XL encoder v0.12.0 [_NEON_]
Encoding [Modular, lossless, effort: 3]
Compressed to 22998.5 kB (7.768 bpp).
5787 x 4093, 43.982 MP/s, 8 threads.
cjxl -d 0 -e 3 input.png output.jxl 3.62s user 0.18s system 438% cpu 0.868 total
$ time ./target/release/jexcel
Encoder setup took 2.59 ms
Decoding input took 150.41 ms
Encoding and output took 526.74 ms
./target/release/jexcel 3.48s user 0.21s system 527% cpu 0.699 total
```
(both uses libjxl `81247d5c`, `-d 0 -e 3`)
|
|
2025-07-05 10:00:34
|
and 526.74 ms corresponds to 44.968 MP/s
|
|
2025-07-05 11:47:56
|
on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well
|
|
|
_wb_
|
2025-07-05 12:16:16
|
I wonder what they do differently
|
|
|
Tirr
|
2025-07-05 12:17:37
|
does libjxl thread pool do work stealing too?
|
|
|
spider-mario
|
2025-07-05 12:17:56
|
I donβt believe it does
|
|
|
Tirr
|
2025-07-05 12:18:38
|
I guess the difference is from some propety of work-stealing queue
|
|
|
spider-mario
|
2025-07-05 12:22:52
|
https://github.com/libjxl/libjxl/blob/81247d5cf29e472700d65ebef72d91edcde6a396/lib/threads/thread_parallel_runner_internal.cc
|
|
|
jonnyawsom3
|
|
Tirr
on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well
|
|
2025-07-05 02:53:26
|
For d0 e1, Clang is also substantially faster (150%+) than gcc. I figured out it's to do with the multithreading, so I wouldn't be surprised if something similar is happening
https://github.com/libjxl/libjxl/issues/2268#issuecomment-2999629502
|
|
|
Tirr
|
2025-07-05 02:53:52
|
both are built using clang though
|
|
|
jonnyawsom3
|
2025-07-05 02:54:59
|
Yeah, I mean that if Clang is already doing something better than gcc, then maybe Rayon is doing even more. Though you already figured that out
|
|
|
Tirr
|
|
Tirr
on d0 e1 rayon is much faster (1000 MP/s vs. 800 MP/s) but it doesn't seem to scale well
|
|
2025-07-06 06:51:45
|
nevermind, I was setting `use_original_profile = 1` when doing lossy π
maybe it was color management overhead
|
|
2025-07-06 06:52:43
|
now rayon threadpool perf is comparable to libjxl threadpool
|
|
2025-07-06 07:24:27
|
anyway the code is here: <https://github.com/tirr-c/jexcel>, no prebuilt binaries though
|
|
2025-07-06 07:24:50
|
and it's encoding only currently
|
|
2025-07-06 08:35:56
|
in my testing, rayon threadpool is comparable to or faster than libjxl one, depending on image / encoder params
|
|
|
Lilli
|
2025-07-24 02:41:36
|
Hello hello, it's been a while. I was wondering if it's at all possible to set the `JxlPixelFormat` to FLOAT16, while my input data is in float32, and the data is far from the float16 limit
|
|
2025-07-24 02:42:36
|
I'm saying that because so far, it outputs float32 when checking the decompressed jxl
|
|
2025-07-24 02:57:27
|
I managed to get it to compress in uint16, which I suppose I could do but it wasn't the goal
|
|
|
jonnyawsom3
|
|
Lilli
I'm saying that because so far, it outputs float32 when checking the decompressed jxl
|
|
2025-07-24 03:17:46
|
Are you doing lossy or lossless? Lossy is always float32 internally with bit depth just being metadata
|
|
|
Lilli
|
2025-07-24 03:29:55
|
lossy oh okay, then that's fine I suppose
|
|
2025-07-24 03:30:41
|
When compressing with uint16, it's also float32 inside?
|
|
|
spider-mario
|
|
Lilli
|
2025-07-24 03:38:11
|
Okayyy thanks !
|
|
2025-07-29 08:37:01
|
So, with a bit of testing, I noticed that the file size is very different if I give uint16 compared to a float, but saying it's float16 and giving float32 works but does not make the filesize equal to the uint16 filesize (nor in the middle), it's just like the float32.
|
|
2025-07-29 08:44:00
|
Is there a way to make it behave like I imagine it should ? is it a matter of actually sending float16 data? Because there's no easy way to do that in c++
|
|
|
_wb_
|
2025-07-29 08:56:40
|
you are doing lossy, right? do the decoded images look the same?
|
|
2025-07-29 08:57:14
|
when passing data as uint (8 or 16), iirc it will implicitly assume it uses the sRGB transfer function
|
|
2025-07-29 08:57:33
|
while when passing it as float, the implicit assumption is that it's linear
|
|
2025-07-29 09:01:31
|
if the input is interpreted incorrectly, then the results of lossy compression will be wrong β the jxl image will not render correctly, and if you fix that by decoding to ppm/pfm and interpreting it correctly, then still results will be poor since the lossy compression will be done in a perceptually wrong way
|
|
2025-07-29 09:04:27
|
to avoid implicit assumptions, it's best to explicitly pass the colorspace info to the encoder
|
|
|
Lilli
|
2025-07-29 09:06:00
|
Yes lossy, and I always explicitly pass the colorspace as JxlColorEncodingSetToLinearSRGB
|
|
2025-07-29 09:12:30
|
What I have is a float32 array that I'd like to compress with a precision of float16 but the output filesize of the float16 is the same as float32
So maybe I'm doing it wrong :/
|
|
2025-07-29 09:59:15
|
Also, must the values in the float image be [0,1] ?
|
|
|
_wb_
|
2025-07-29 10:08:40
|
the nominal range is [0,1] but you can go outside it
|
|
2025-07-29 10:11:17
|
for lossy, the internal precision for processing is always float32 and the actual precision depends on the DCT quantization, which depends on the quality/distance setting and cannot be expressed in terms of precision of RGB samples since it's in the frequency domain and for XYB components
|
|
|
Lilli
|
2025-07-29 01:06:12
|
Okay, that's interesting, I suppose that's why it gives different results. I'm giving raw data ( [0,65000]) and it compresses completely differently in float compared to u16, which sounds like it's on purpose according to what you say, but it also means that the distance target I give has a totally different impact. When I normalize the input, I obtain exactly the same as u16 compression. (if I stretch my u16 input to 0-65535)
|
|
|
_wb_
|
2025-07-29 01:45:53
|
Float data in [0,65000] would be super bright since nearly everything is brighter than the nominal white. When decoding that to u16, nearly everything will be clamped to white.
|
|
2025-07-29 01:48:31
|
So you should normalize it to [0,1], and probably also do white balancing. Lossy is perceptual, so the input needs to correspond with how it gets rendered.
|
|
|
Lilli
|
2025-07-29 02:00:53
|
Yes, jxl really has this perceptual element embedded in, but my images are raw linear, coming from a camera, so I'm forced to use distance = 0.025 for the output to look like something. The image is basically mostly 0-2000 with some smaller high dynamic structures (stars...). I played around with the coefficient about target luminosity I don't recall the name, but it produced inconsistent results across my testing dataset
|
|
2025-07-29 02:01:43
|
They'll definitely not be rendered (not before having gamma and co applied), they're just for storage (because there's actually nothing beating jxl right now on my dataset)
|
|
|
jonnyawsom3
|
2025-07-29 02:06:44
|
Yeah, right now there's no way to tell the encoder "Hey, this is for editing, don't crush the blacks". There is intensity target, but that's a workaround with it's own problems
|
|
|
Lilli
|
2025-07-29 03:06:19
|
What does this intensity target really do? I've tried it with a few different parameters, I guess it somehow shifts the relative importance of the regions
Also another question maybe less related, as far as I understand a distance of 1 is supposed to be "perceptually equivalent", but if I, say, zoom x2, then should the distance be 0.5, 0.25, 0.75, 0.1 .. ?
|
|
|
_wb_
|
2025-07-29 03:33:03
|
zooming in or increasing display brightness makes artifacts easier to see but by how much exactly is hard to tell and may depend a bit on the image content itself
|
|
2025-07-29 03:36:19
|
intensity target is how many nits the signal value 1.0 (the nominal max value) is rendered at
|
|
|
Lilli
|
2025-07-29 03:56:03
|
Oh I see ! So I'd need quite a big value for my content since I assume. I'm not sure I'll get around to testing that but it's good to know if I need more control over the compression ratio
|
|
2025-07-29 03:56:42
|
because at some point, reducing the distance doesn't increase quality anymore, it sort of "caps out" even though artifacts are still present
|
|
|
jonnyawsom3
|
2025-07-29 04:14:34
|
Yeah, generally for floats increasing intensity target to 10000 or 20000 is the way to go, since then it still 'sees' the perceptual image, but doesn't see the brightness difference
|
|
|
_wb_
|
2025-07-29 06:39:31
|
As a rule of thumb: if you make your input a PNG file then the way it looks in Chrome is the way cjxl sees it, i.e. what the perceptual compression is trying to work with.
In your case you cannot do that since your input is not something that is rendered directly but it's a raw that will still be heavily postprocessed. Libjxl isn't really designed for that. I would recommend either doing lossless raw, or cook the raw enough to make it a visual image that makes sense as-is before applying lossy compression to it.
|
|
|
π°πππ
|
|
_wb_
As a rule of thumb: if you make your input a PNG file then the way it looks in Chrome is the way cjxl sees it, i.e. what the perceptual compression is trying to work with.
In your case you cannot do that since your input is not something that is rendered directly but it's a raw that will still be heavily postprocessed. Libjxl isn't really designed for that. I would recommend either doing lossless raw, or cook the raw enough to make it a visual image that makes sense as-is before applying lossy compression to it.
|
|
2025-07-29 08:10:40
|
> if you make your input a PNG
Is there a similar reason for why we can't achieve perfect metric scores (butteraugli, ssimulacra2, cvvdp) with lossless JPEG reconstruction?
I always see scores such as ~85 ssimulacra2.
If I create an intermediate PNG file; then the true/modular lossless JXL works as expected
|
|
|
spider-mario
|
2025-07-29 08:20:22
|
is that 85 with the same decoder for source and reconstructed JPEG? if you pass the original jpeg + the recompressed jxl, AFAIK butteraugli/ssimulacra2 will use libjpeg-turbo to decode the former but libjxl for the latter, which gives different (better) results (but still compliant)
|
|
2025-07-29 08:22:25
|
since the original JPEG can be reconstructed bit-exactly from the jxl, there is obviously no inherent loss, but the JPEG standard gives decoders some leeway regarding the exact decoded pixels
|
|
|
π°πππ
|
2025-07-29 08:22:43
|
|
|
2025-07-29 08:23:08
|
|
|
2025-07-29 08:23:50
|
`djxl` then doing the tests, still the same
|
|
|
spider-mario
|
2025-07-29 08:23:57
|
yeahΒ β to get a better score, you should either decode `ref.jpg` the same way that libjxl does to a PNG and pass that (I believe djpegli should get you there or close), or decode `enc.jxl` to a JPEG and pass the result
|
|
|
π°πππ
|
|
spider-mario
|
2025-07-29 08:24:42
|
decoding `enc.jxl` to a PNG instead of a JPEG will have the exact same βproblemβ as passing `enc.jxl` directly
|
|
|
π°πππ
|
2025-07-29 08:24:46
|
can `djxl` decode jpg?
|
|
|
spider-mario
|
2025-07-29 08:24:56
|
iirc no
|
|
|
π°πππ
|
2025-07-29 08:25:11
|
then how can we decode both of them the same way?
|
|
|
spider-mario
|
2025-07-29 08:25:26
|
djpegli should give the same or at least similar results
|
|
|
π°πππ
|
2025-07-29 08:25:41
|
let me check
|
|
|
spider-mario
|
2025-07-29 08:26:03
|
but really, to test lossless JPEG reconstruction, arguably the best way is to check that you can reconstruct the original file
|
|
|
π°πππ
|
|
spider-mario
|
2025-07-29 08:26:14
|
if you can then you know that you can, in principle, arrive at the same pixels
|
|
|
π°πππ
|
2025-07-29 08:27:17
|
I also tried building libjxl dynamically (the current build is static) and using the provided `libjpeg.so` system-wide
|
|
2025-07-29 08:27:22
|
ther result is also the same
|
|
|
_wb_
|
2025-07-29 08:34:29
|
ssimulacra2 is very sensitive to even small errors. If you use djpegli to decode to a 16-bit PNG, the score should be higher, but still not 100 since there will be tiny differences compared to the float32 decode
|
|
2025-07-29 08:35:00
|
Can djpegli decode to PFM? Maybe if you do that, you can get a perfect match
|
|
2025-07-29 08:40:22
|
JPEG does not have a fully specified decode (there are large tolerances for conforming decoders) so you cannot really check losslessness by looking at decoded pixels. The DCT coefficients are defined exactly, but not the RGB reconstruction β it is defined mathematically but the precision required by the spec is quite low and many popular JPEG decoders like libjpeg-turbo use a fast and low memory but quite low precision implementation (quantizing every intermediate step to 8-bit, etc).
|
|
|
π°πππ
|
|
_wb_
Can djpegli decode to PFM? Maybe if you do that, you can get a perfect match
|
|
2025-07-29 08:44:36
|
Yes, it can.
and the score increases
|
|
2025-07-29 08:44:52
|
but never seen over 90
|
|
2025-07-29 08:45:12
|
if it can't be decoded as is
How would viewers decode it as the same?
|
|
|
_wb_
|
2025-07-29 08:46:01
|
They don't, different viewers show slightly different pixels for the same jpeg file.
|
|
|
π°πππ
|
2025-07-29 08:46:55
|
So, it's a kind of a limitation of JPEG?
|
|
|
_wb_
|
2025-07-29 08:47:23
|
I would expect djpegli to be closer to the djxl decode of jxl-recompressed-jpegs, but maybe there's some difference.
|
|
|
π°πππ
|
|
π°πππ
So, it's a kind of a limitation of JPEG?
|
|
2025-07-29 08:48:10
|
I first assumed that, since JPG is already lossy; JXL would handle artifacts in a different way
|
|
2025-07-29 08:48:12
|
so we get different scores
|
|
2025-07-29 08:48:16
|
but it's probably not the case here
|
|
|
_wb_
|
2025-07-29 08:48:38
|
You could try turning off Chroma from Luma when recompressing jpegs, that introduces some difference that is too small to matter for reconstructing the original jpeg file but is still there
|
|
2025-07-29 08:50:02
|
Even with libjpeg-turbo alone, you can decode a jpeg in different ways depending on the options you give it, and you will get ppm files that have some differences
|
|
2025-07-29 08:53:27
|
ssimulacra2 is very picky about small differences, even converting an 8-bit image to 16-bit will cause it to see some error even though mathematically those two images would be identical (8-bit numbers correspond exactly to 16-bit numbers since 255 is a divisor of 65535). It still sees a difference because they don't convert to the exact same float32 values (due to precision limits in the conversion arithmetic), so you get a score below 100.
|
|
|
π°πππ
|
2025-07-29 08:54:00
|
Got it
|
|
2025-07-29 08:54:08
|
Then modular lossless is extremely precise with it
|
|
2025-07-29 08:54:20
|
so I always get 100
|
|
2025-07-29 08:55:36
|
Here, the jpeg decoding is not a limitation
|
|
|
spider-mario
|
2025-07-29 09:06:03
|
thatβs because like `ssimulacra2`, `cjxl` uses libjpeg-turbo to get the pixel values from the input jpeg
|
|
2025-07-29 09:06:28
|
if an update to `ssimulacra2` made it use jpegli instead, that jxl file made from libjpeg-turbo pixels would stop getting a score of 100
|
|
2025-07-29 09:07:17
|
itβs not inherently any more correct than the jxl you get with lossless jpeg recompression
|
|
2025-07-29 09:07:21
|
just more libjpeg-turbo-esque
|
|
|
π°πππ
|
|
spider-mario
just more libjpeg-turbo-esque
|
|
2025-07-29 09:11:30
|
I actually tried to replace `libjpeg-turbo` with jpegli while statically compiling `libjxl`
|
|
2025-07-29 09:11:32
|
but I got many errors
|
|
2025-07-29 09:11:50
|
I am assuming jpegli jpeg library misses some functions
|
|
|
jonnyawsom3
|
2025-07-31 08:42:29
|
<@794205442175402004> I assume these quant tables are 'backwards' compared to classic JPEG? With LF at the start and HF at the end?
<https://github.com/libjxl/libjxl/blob/c50010fb13160e4d2745ee27c6458aa63c51ebce/lib/jxl/enc_modular.cc#L95>
|
|
|
_wb_
|
2025-07-31 08:50:29
|
yes, it's HF to LF there β the array index corresponds to the hshift+vshift where 0 is the highest frequency; the LF of JPEG/VarDCT would correspond to index 6 (hshift==vshift==3, since it corresponds to 8x8 downscaling)
|
|
|
jonnyawsom3
|
2025-07-31 08:56:54
|
Wait, HF to LF? Don't you want less quantization for the high frequency? B starts all the way at 2048
|
|
|
_wb_
|
2025-07-31 08:26:59
|
higher quant factor means more aggressive quantization, lower quality
|
|
2025-07-31 08:28:26
|
in the context of squeeze, if the first two quant factors are +infinity for X and B, that basically corresponds to 4:2:0 chroma subsampling.
|
|
2025-07-31 08:31:53
|
note that these quant factors are scaled according to the distance setting
|
|
|
jonnyawsom3
|
2025-07-31 09:19:15
|
Huh, interesting. That would explain the chroma bleeding I noticed
|
|
2025-07-31 09:42:54
|
It took me a while to get DCT out of my head, but I get it now. Each quant entry is for a squeeze level, I must've only skimmed your first message
|
|
|
A homosapien
|
2025-08-03 06:24:26
|
There might be a few bugs when using photon noise combined with lossy progressive. Photon noise seems be applied to the LF and the HF. This is causing a LF blurry noise texture to be present. This is present in all major versions of libjxl I've tested so far.
Another issue is that photon noise is much much stronger when used with progressive, likely meaning it's being applied multiple times over.
|
|
|
CrushedAsian255
|
|
A homosapien
There might be a few bugs when using photon noise combined with lossy progressive. Photon noise seems be applied to the LF and the HF. This is causing a LF blurry noise texture to be present. This is present in all major versions of libjxl I've tested so far.
Another issue is that photon noise is much much stronger when used with progressive, likely meaning it's being applied multiple times over.
|
|
2025-08-03 07:43:54
|
maybe its being applied once to each progressive scan
|
|
|
A homosapien
|
2025-08-03 09:12:03
|
That's what I think as well
|
|
2025-08-03 09:39:23
|
I made an issue
https://github.com/libjxl/libjxl/issues/4368
|
|
|
Traneptora
|
|
π°πππ
Then modular lossless is extremely precise with it
|
|
2025-08-03 10:43:09
|
it's less that modular lossless is "extremely precise" but modular, fundamentally, is a sophisticated way to store an array of integers, not of floats
|
|
2025-08-03 10:43:53
|
so there's no precision issue because the pixel arrays are identical integer arrays
|
|
|
jonnyawsom3
|
|
A homosapien
There might be a few bugs when using photon noise combined with lossy progressive. Photon noise seems be applied to the LF and the HF. This is causing a LF blurry noise texture to be present. This is present in all major versions of libjxl I've tested so far.
Another issue is that photon noise is much much stronger when used with progressive, likely meaning it's being applied multiple times over.
|
|
2025-08-03 10:52:15
|
Photon noise is flagged in the frame header, so it's likely being set for every frame instead of just the final one
> A major condition to trigger this bug is that the image has to be larger than 2048x2048, implying chunked encoding must be at play here.
That's more interesting though, since the frame header can't have a value other than 0 or 1, so either the ISO setting is somehow getting added internally per MA tree, or it's a decoder bug
|
|
|
spider-mario
|
|
Traneptora
so there's no precision issue because the pixel arrays are identical integer arrays
|
|
2025-08-03 11:28:15
|
in this instance, it arguably has less to do with VarDCT being conceptually floats and more to do with it being, well, DCT coefficients and not pixels
|
|
2025-08-03 11:28:38
|
VarDCT stores exactly the original JPEGβs DCT coefficients; Modular stores exactly the libjpeg-turbo-decoded pixels
|
|
2025-08-03 11:29:18
|
the proper comparison to detect whether there is any loss compared to the original JPEG file is arguably the former
|
|
|
jonnyawsom3
|
2025-08-03 11:29:43
|
There's an extremely long Github issue complaining that storing the DCT is less acccurate than the pixels, but testing showed that the DCT is (unsurprisingly) more accurate to the original
|
|
|
π°πππ
|
|
Traneptora
so there's no precision issue because the pixel arrays are identical integer arrays
|
|
2025-08-03 11:30:22
|
So it's:
JPEG path: Original pixels -> JPEG DCT coefficients -> Decoded pixels (with decoder variations)
Modular path: Original integer pixels -> Stored integer pixels -> Identical integer pixels
|
|
2025-08-03 11:30:47
|
There's no mathematical conversion or loss because you're storing the exact same integers you started with.
|
|
|
JaitinPrakash
|
|
π°πππ
So it's:
JPEG path: Original pixels -> JPEG DCT coefficients -> Decoded pixels (with decoder variations)
Modular path: Original integer pixels -> Stored integer pixels -> Identical integer pixels
|
|
2025-08-03 11:51:09
|
The reconstruction route is JPEG DCT β VarDCT β jxl decoder, while the modular route is JPEG DCT β pixels β Modular β jxl decoder, and if ssimulacra2 also uses JPEG DCT β pixels with the same libjpeg as modular, then despite reconstruction technically being a more exact lossless, it introduces an extra variance in the form of libjxl and libjpeg decoding the DCT differently, causing that discrepancy. Or at least, that's what i think is going on here.
So reconstruction is more lossless than modular lossless from an extremely exact perspective, but because ssimulacra2 reads the DCT differently between the source and repacked jpeg, the modular lossless, which used the same DCT β pixels step as ssimulacra2, is more similar to the jpeg.
|
|
|
π°πππ
|
2025-08-04 12:59:57
|
Not just ssimulacra2, it's similar for butteraugli (pretty much expected since it's libjxl) but also external metrics such as CVVDP or others
The decoding difference applies to all cases here, then
|
|
|
_wb_
|
2025-08-04 07:36:51
|
ssimulacra2 does take jpeg as input but it's not really well-defined what that means since ssimulacra2 compares pixel values and those are not fully specified for a jpeg image. The current jpeg decoder it uses is libjpeg-turbo but if we change that (or use different decoder options) the pixels will be different and the ssimulacra2 scores will change too.
|
|
2025-08-04 07:38:47
|
if you want to compare two jpegs with a pixel-based metric, you should use the same decoder implementation for both, otherwise the decoder difference is included in the metric score and you'll end up with nonsense conclusions like an image being different from itself.
|
|
2025-08-04 07:44:19
|
It would help if we'd make it so libjxl can also decode jpegs directly; the main ingredients to do that is already in the codebase anyway, you just have to load the jpeg as if you're going to recompress it, but then skip the entropy coding and decoding and just feed it directly to the render pipeline. Though it may be a bit more complicated in practice since probably the encode-side structs and decode-side structs don't match up that nicely.
|
|
|
jonnyawsom3
|
|
_wb_
It would help if we'd make it so libjxl can also decode jpegs directly; the main ingredients to do that is already in the codebase anyway, you just have to load the jpeg as if you're going to recompress it, but then skip the entropy coding and decoding and just feed it directly to the render pipeline. Though it may be a bit more complicated in practice since probably the encode-side structs and decode-side structs don't match up that nicely.
|
|
2025-08-04 08:15:15
|
Removing dependencies on external JPEG libraries and just using jpegli would also be nice. Requiring other JPEG libraries, to build a JPEG library is... Odd
|
|
|
_wb_
|
2025-08-04 10:04:41
|
libjxl itself doesn't depend on an external jpeg library, but yes, the tools on the libjxl repo do use it. It's not a strict dependency iirc β you could build a cjxl/djxl/ssimulacra2/etc that can only deal with jxl and ppm/pam.
|
|
|
jonnyawsom3
|
2025-08-04 10:08:33
|
I meant for compiling jpegli too, since it's intertwined in the repo (even with the Google repo split, it has duplicate libjxl files so it can't be a submodule)
|
|
|
π°πππ
|
2025-08-04 10:09:04
|
we also can't build with jpegli static library instead of libjpeg-turbo
I tried to modify the build process, and create a phantom pkgconfig by also renaming compiled jpegli library to `libjpeg.a`
The compilation won't be successful.
|
|
2025-08-04 10:09:20
|
AFAIK, jpegli provided library should cover everything, no?
|
|
2025-08-04 10:10:00
|
> When building the project, two binaries, tools/cjpegli and tools/djpegli will be built, as well as a lib/jpegli/libjpeg.so.62.3.0 shared library that can be used as a drop-in replacement
|
|
|
jonnyawsom3
|
2025-08-04 10:11:35
|
It's also missing the 12bit API, so it's not actually a drop in replacement for some applications
|
|
|
CrushedAsian255
|
|
_wb_
libjxl itself doesn't depend on an external jpeg library, but yes, the tools on the libjxl repo do use it. It's not a strict dependency iirc β you could build a cjxl/djxl/ssimulacra2/etc that can only deal with jxl and ppm/pam.
|
|
2025-08-05 01:31:57
|
to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?
|
|
|
π°πππ
|
|
CrushedAsian255
to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?
|
|
2025-08-05 01:59:19
|
you can just encode it in this case? <:galaxybrain:821831336372338729>
|
|
2025-08-05 01:59:50
|
but technically you decode a JXL then; not the JPEG
|
|
|
_wb_
|
|
CrushedAsian255
to decode a JPEG with only libjxl, can't you use the libjxl JPEG->JXL transcoding, and then decode the resultant JXL?
|
|
2025-08-05 06:54:19
|
Yes you can. There would be room for optimization though, since doing it that way does unnecessary entropy coding + decoding.
|
|
|
CrushedAsian255
|
2025-08-05 06:55:15
|
I feel it would be beneficial to have libjxl also handle JPEGs , and have a libjpeg compatible API, so it can be a drop in replacement to libjpeg
|
|
|
jonnyawsom3
|
2025-08-05 07:16:18
|
An idea that's come up a few times, but even jpegli isn't fully drop-in, so trying to make libjxl so would be rough
|
|
|
CrushedAsian255
|
|
_wb_
Yes you can. There would be room for optimization though, since doing it that way does unnecessary entropy coding + decoding.
|
|
2025-08-05 10:43:38
|
Proof of Work π
|
|
|
jonnyawsom3
|
2025-08-06 01:06:57
|
https://github.com/libjxl/libjxl/issues/4376
|
|
2025-08-06 01:07:16
|
I left a comment, as I think there's still a lot left to do
|
|
2025-08-06 01:10:10
|
A lot of pending jpegli PRs, some bugs recently found and some deeper issues that have bodged fixes in place... But it's certainly about time we got a new version out
|
|
|
|
afed
|
2025-08-06 01:11:43
|
yeah, that would be nice
|
|
|
HCrikki
|
2025-08-06 02:28:14
|
please a **.DLL** of jpegli a user or lazy dev can use to swap their existing jpeg decoder/encoder library without needing to bother with source code
|
|
|
Quackdoc
|
2025-08-08 03:55:31
|
compiling libjxl with `-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=TRUE` seems to really balloon the size for some reason
|
|
2025-08-08 03:58:43
|
~~meson when?~~
|
|
2025-08-08 04:09:55
|
actually this dudes build script is just fubar, no idea how he is building
|
|
2025-08-08 09:36:28
|
so highway is the cause of both major size regressions in libjxl, at least it's localized to one dependency. would it be helpful workflows would have one to specifically check file sizes of various compile envs?
|
|
|
jonnyawsom3
|
2025-08-08 10:33:08
|
<@811568887577444363> the above message is in regards to this issue https://github.com/libjxl/libjxl/issues/3887 (I think)
|
|
|
Quackdoc
|
2025-08-08 11:12:50
|
https://cdn.discordapp.com/emojis/721359241113370664?size=64
|
|
|
Lilli
|
2025-08-08 03:02:44
|
Hi there it's me again!
I was testing my implementation regarding memory usage. I implemented the chunking approach for writing in a file in chunks of 1MB of input data
i.e something like this:
```
// START_ALLOC is 1<<20, 1MiB
std::vector<uint8_t> lCompressed(START_ALLOC);
uint8_t* lNext_out = lCompressed.data();
size_t lAvail_out = lCompressed.size();
while ( true )
{
JxlEncoderStatus lResult =
JxlEncoderProcessOutput(lEnc.mEnc.get(), &lNext_out, &lAvail_out);
size_t lWritten = lNext_out - lCompressed.data();
if ( lWritten > 0 )
{// write data to file on stream
if ( fwrite(lCompressed.data(), 1, lWritten, lEnc.mFp) != lWritten )
{
lError = "Failed to write to file: "+ pFilename + "\n";
break;
}
// Always reset buffer for next chunk
lNext_out = lCompressed.data();
lAvail_out = lCompressed.size();
}
if ( lResult == JXL_ENC_NEED_MORE_OUTPUT )
{
// we must not resize, just continue
continue;
}
...
```
Of course I used the `JxlEncoderAddChunkedFrame`.
So this is how I write into my file without needing the entire output buffer in memory, and this works.
But it still takes a big amount of memory!
I encode in Lossy and tried different settings of distance and effort to no avail.
The peak memory usage for a file of about 300MB is 832MiB! (including the initial 300MB of input data)
It seems a copy of the initial data is performed at some point:
`286.0 MiB: (anonymous namespace)::JxlEncoderAddImageFrameInternal(JxlEncoderFrameSettingsStruct const*, unsigned long, unsigned long, bool, jxl::JxlEncoderChunkedFrameAdapter&&) (encode.cc:2366)`
This line seem to allocate 286MiB
Which is the size of the input data.
I used Massif for the profiling:
|
|
2025-08-08 03:09:34
|
I can give more code of course, but I don't want to unnecessarily pollute this channel π
I of course only pass pointers around... I'm not sure how to specify that chunked frame should only use the data as is, and not copy it
|
|
|
Demiurge
|
2025-08-10 09:08:58
|
Hey <@238552565619359744> I saw the work you recently did and I'm pretty impressed. I hope your patches get merged.
|
|
2025-08-10 09:09:23
|
But I'm sure they will.
|
|
|
jonnyawsom3
|
2025-08-10 09:26:38
|
I've not done anything recently, just made notes of ideas to try
|
|
|
Quackdoc
|
2025-08-10 09:30:42
|
imma bisect highway and see if I cant figure out what commit of that made it baloon so much
|
|
2025-08-10 10:00:59
|
yeah bisected it down to the commit, commented on the issue thread of the codegen, dynamic dispatch commit is what gone done it
|
|
|
Lilli
|
2025-08-11 06:47:29
|
Would someone know who to ask about the memory consumption I noted above ?
|
|
|
jonnyawsom3
|
|
Lilli
Would someone know who to ask about the memory consumption I noted above ?
|
|
2025-08-11 06:59:15
|
There's two ideas I have. The first is adding `JxlEncoderFrameSettingsSetOption(frame_settings, JXL_ENC_FRAME_SETTING_PATCHES, 0);` to disable Patches
The other is trying effort 1 lossless, as that has minimal memory usage for encoding. Then you can see if it's an encoder problem or a buffer handling issue
|
|
|
Lilli
|
2025-08-11 06:59:36
|
I didn't mention, I'm using lossy !
|
|
|
jonnyawsom3
|
2025-08-11 07:02:50
|
I know, patches works on lossy and the lossless is just to test memory overhead
|
|
|
Lilli
|
|
There's two ideas I have. The first is adding `JxlEncoderFrameSettingsSetOption(frame_settings, JXL_ENC_FRAME_SETTING_PATCHES, 0);` to disable Patches
The other is trying effort 1 lossless, as that has minimal memory usage for encoding. Then you can see if it's an encoder problem or a buffer handling issue
|
|
2025-08-11 07:11:55
|
Ok I tried it,
The command to disable patches doesn't seem to change anything in terms of memory
The effort 1 also did not change anything regarding memory either, the peak memory usage is the same 832MiB
|
|
2025-08-11 07:15:22
|
It seems the graph showing the distribution of memory is a little different, but the total is the same. Notably the "unknown inlined fun" now is much lower, but the total is still the same
|
|
2025-08-11 07:16:47
|
I can give the massif files, so that one could check what's going on
|
|
2025-08-11 07:52:29
|
I did it again making sure I was lossless vs lossy, and lossless effort 1 uses even more memory at 852MiB, lossy effort 1 is 832 with or without patches
|
|
2025-08-11 07:55:59
|
the peak memory usage is highlighted and the detail is on the other image. Data for Lossless Effort 1
The orange part corresponds to the 286MiB allocated by the `JxlEncoderAddChunkedFrame`
The cyan part is the `jxl::AlignedMemory` 181MiB
The blue part is apparently a `std::vector<jxl::Token>` 96MiB
|
|
|
_wb_
|
2025-08-11 08:08:49
|
that looks like most memory is allocated not by libjxl but by your tiff reader, no?
|
|
|
Lilli
|
2025-08-11 08:17:07
|
You are correct, the Tiff reader allocated the 286MiB that is then the input to the Jxl encoder. Then the Jxl encoder's "AddChunkedFrame" allocates the same amount again (which is my problem).
I give the pointer to my data inside the "chunking state" opaque structure inside the `JxlChunkedFrameInputSource`
I want to understand how to avoid an entire copy of my data π€·ββοΈ
|
|
|
_wb_
|
2025-08-11 08:31:56
|
how large is the image and how large are the chunks?
|
|
|
Lilli
|
2025-08-11 08:33:56
|
The image is 4291x11649
What do you mean how large are the chunks?
I use JXL_ENC_FRAME_SETTING_BUFFERING= 2
|
|
|
jonnyawsom3
|
2025-08-11 09:07:58
|
<https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1772>
``` // TODO(veluca): handle different values of `buffering`.
if (frame_data.xsize <= 2048 && frame_data.ysize <= 2048) {
return false;```
Currently `JXL_ENC_FRAME_SETTING_BUFFERING=2` is using the same behaviour as `JXL_ENC_FRAME_SETTING_BUFFERING=1`, though I think the results you're getting are still correct. Lossy effort 7 uses around 400 MB, with the image itself being 143 MB and your TIFF loader using 286 MB which adds up to over 800 MB.
The strange part to me, is that effort 1 lossless used more... That should use around 170 MB instead of 400 MB like effort 7 lossy
|
|
2025-08-11 09:09:21
|
Though, my results are based on an 8K image, so only 30 MP instead of 50 MP
|
|
|
Lilli
|
2025-08-11 09:14:36
|
Oh wow, okay! Thank you for finding that out!
For lossless I just turned the distance to 0, but if there's another flag to raise, I may have failed to do it properly
In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...
|
|
|
jonnyawsom3
|
2025-08-11 09:16:44
|
This may be of interest, if lower effort levels of libjxl aren't enough https://github.com/Traneptora/hydrium
|
|
|
Lilli
|
2025-08-11 09:17:47
|
Okay that may be ! Thanks a lot for all this information !
|
|
2025-08-11 09:37:16
|
Still, I don't really get the point of making a streaming API if the whole data is copied first in memory... I might've missed something. But this hydrium thing looks quite early stages, hopefully it gives similar compression results
|
|
|
jonnyawsom3
|
|
Lilli
Still, I don't really get the point of making a streaming API if the whole data is copied first in memory... I might've missed something. But this hydrium thing looks quite early stages, hopefully it gives similar compression results
|
|
2025-08-11 09:57:50
|
```
wintime -- cjxl --streaming_input --streaming_output Test.ppm nul
JPEG XL encoder v0.12.0 7deb57d7 [_AVX2_,SSE4,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 4857.7 kB (1.171 bpp).
7680 x 4320, 7.532 MP/s [7.53, 7.53], , 1 reps, 16 threads.
PageFaultCount: 909106
PeakWorkingSetSize: 367.8 MiB
QuotaPeakPagedPoolUsage: 234.1 KiB
QuotaPeakNonPagedPoolUsage: 21.38 KiB
PeakPagefileUsage: 398.5 MiB
wintime -- cjxl Test.ppm nul
JPEG XL encoder v0.12.0 7deb57d7 [_AVX2_,SSE4,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 4857.7 kB (1.171 bpp).
7680 x 4320, 7.462 MP/s [7.46, 7.46], , 1 reps, 16 threads.
PageFaultCount: 960855
PeakWorkingSetSize: 567.9 MiB
QuotaPeakPagedPoolUsage: 44.44 KiB
QuotaPeakNonPagedPoolUsage: 21.77 KiB
PeakPagefileUsage: 688 MiB
```
Streaming certainly does make a difference, so maybe <@794205442175402004> has an idea what's wrong with your implementation. Hydrium is equilavent to libjxl effort 3
|
|
|
Lilli
|
2025-08-11 10:22:31
|
Oh I see !
I suppose the relevant code is this part. I think I'm just filling the structures according to the documentation
```cpp
struct ChunkingState
{
const uint8_t* mImage; // Pointer to the image data
uint32_t mWidth;
uint32_t mOffset;
uint32_t mChannels;
uint32_t mBytesPerPixel;
JxlDataType mInputDataType;
};
// in my export jxl function
ChunkingState state = {reinterpret_cast<const uint8_t*>(pBuffer),
pSizeX, 0, pChNb,
pChNb * lInputBitsPerSample / 8, lType};
JxlChunkedFrameInputSource lChunked = {};
lChunked.opaque = &State;
lChunked.get_color_channels_pixel_format = get_color_channels_pixel_format;
lChunked.get_color_channel_data_at = get_color_channel_data_at;
lChunked.release_buffer = release_buffer;
JxlEncoderFrameSettings* lFrameSettings =
JxlEncoderFrameSettingsCreate(lEnc.get(), nullptr);
//... some more options, effort, distance
JxlEncoderFrameSettingsSetOption(lFrameSettings,
JXL_ENC_FRAME_SETTING_BUFFERING, 2);
JxlEncoderStatus lIsChunkedSet =
JxlEncoderAddChunkedFrame(lFrameSettings, true, lChunked);
```
|
|
|
_wb_
|
2025-08-11 10:28:35
|
for fast-lossless e1 the pixel format has to be uint8 or uint16 iirc, not a float one
|
|
|
Lilli
|
2025-08-11 10:30:12
|
I see, in my case I want e4 with a distance of 0.025
|
|
2025-08-11 10:30:40
|
So, not fast and not lossless, just float
|
|
|
_wb_
|
2025-08-11 10:30:49
|
ah ok
|
|
2025-08-11 10:32:15
|
that's a very low distance, are you sure you're feeding the input in a way that makes perceptual sense, i.e. not something that renders as a near-black image?
|
|
|
Lilli
|
2025-08-11 10:34:11
|
yes it does render as a near black image, and we've tried to transform it into a "perceptually meaningful image", which requires then another transform to undo it when decompressing. But the whole point of my process is to compress the raw sensor image so that the processing can be done elsewhere
|
|
2025-08-11 10:35:31
|
But I imagine that won't be the issue regarding memory usage would it?
|
|
|
_wb_
|
2025-08-11 11:19:47
|
No it shouldn't.
|
|
2025-08-11 11:26:29
|
But I would avoid encoding near-black images. Even if you cannot do real channel balancing, you maybe could just scale things by some constant so it's not super dark. If you can get it so d1 looks reasonable and you can use d0.2 or so, you will probably get better results than using d0.025 on an image where d1 looks very bad...
|
|
|
Lilli
|
2025-08-11 11:31:19
|
I hear you, this is an issue I've discussed on this channel before, also playing with the intensity target. I have to do with the constraints we currently face. Currently, the compression is perfectly within our requirements, and JXL performs much better than the other encoders we tried, even with our near black image.
It's just that if libjxl copies my buffer internally or whatever it does, I just can't use it.
My goal is to keep the input image, so there's 300MB of data that I don't want to free, and then a footprint of 150MB for the compression would be very acceptable, just not 450MB...
|
|
2025-08-11 11:32:47
|
And hydrium sounds great, but I'd rather make small changes to my codebase instead of importing a new library that I'd need to wire from scratch.
|
|
2025-08-11 11:35:37
|
You mentioned earlier "what size are your chunks" but I don't see how to specify chunk size, or even give separate chunks, what did you mean?
|
|
|
jonnyawsom3
|
2025-08-11 11:45:22
|
I didn't realise you were using such a low distance, IIRC hydrium is fixed at distance 1
|
|
|
_wb_
|
2025-08-11 11:55:33
|
I don't remember how the chunked encode api works but the idea is that you give large images in tiles to the encoder. <@179701849576833024> wrote most of that iirc so maybe he remembers
|
|
|
Lilli
|
2025-08-11 12:05:53
|
I made a few modifications which seem to make a difference. Seems to be in the default configuration, it by default disables streaming, even if it's activated by BUFFERING=2
|
|
|
|
veluca
|
|
_wb_
I don't remember how the chunked encode api works but the idea is that you give large images in tiles to the encoder. <@179701849576833024> wrote most of that iirc so maybe he remembers
|
|
2025-08-11 12:17:01
|
I barely remember my name these days π
|
|
|
jonnyawsom3
|
2025-08-11 12:25:06
|
There's two checks that need to pass for full chunked/streamed encoding
<https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1772> and <https://github.com/libjxl/libjxl/blob/095b8b2a088483b2f95e33638778db935ea9d43f/lib/jxl/enc_frame.cc#L1644> (I think)
|
|
|
Lilli
|
2025-08-11 02:20:32
|
I have explicitly set most FrameSettings Options so that it passes each test in the `CanDoStreamingEncoding()`... but I'm not really sure it passes them all.
I see there is a sort of a box concept, is that what the streaming is supposed to use ?
I'm really struggling with the documentation π
|
|
|
TheBigBadBoy - πΈπ
|
2025-08-11 02:36:22
|
what is the problem with 800MB of used RAM ?
do you want to encode JXL files on an embedded system or something ?
|
|
|
Quackdoc
|
2025-08-11 02:36:57
|
800mb would still be a lot on something like a low end android phone
|
|
|
jonnyawsom3
|
|
Lilli
Oh wow, okay! Thank you for finding that out!
For lossless I just turned the distance to 0, but if there's another flag to raise, I may have failed to do it properly
In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...
|
|
2025-08-11 02:37:42
|
<@693503208726986763>
> In any case, that makes it very hard for me to use then, I have 2GB of RAM on embedded device ...
|
|
|
TheBigBadBoy - πΈπ
|
2025-08-11 02:39:20
|
yeah see ?
they can even encode 2 images in parallel <:KekDog:805390049033191445> ||/s||
|
|
|
Lilli
|
2025-08-12 07:33:11
|
I feel that one issue was that I didn't use the `JxlEncoderOutputProcessor`, which I now do, along with `JxlEncoderFlushInput`
And now the memory usage ballooned to 3GB! fantastic
|
|
2025-08-12 07:36:56
|
This is cjxl with my input data as a png file (which is indeed 286MiB on my machine too)
with `--streaming_output`
I find it curious that it has twice the 286MiB in memory... I would go as far as to say, suspicious π§
|
|
2025-08-12 07:37:17
|
|
|
2025-08-12 07:38:19
|
One is in the decode Image APNG, and one is in the JxlEncoderAddImageFrame
|
|
2025-08-12 07:45:26
|
it doesn't call addChunkedFrame... hmmm
|
|
2025-08-12 07:52:18
|
I barely remember my name these days π
|
|
|
jonnyawsom3
|
|
Lilli
This is cjxl with my input data as a png file (which is indeed 286MiB on my machine too)
with `--streaming_output`
I find it curious that it has twice the 286MiB in memory... I would go as far as to say, suspicious π§
|
|
2025-08-12 08:42:43
|
Try converting to PFM temporarily with --streaming_input
|
|
|
Lilli
|
2025-08-12 08:50:03
|
Oh wow okay
|
|
2025-08-12 08:50:25
|
|
|
2025-08-12 08:50:44
|
it did take only 300MiB which I think is a little too much still, but would be manageable
The most surprising thing is that there's no full internal copy of the data anymore
|
|
2025-08-12 08:52:07
|
so I'd need to figure out how to setup the encoder to do exactly whatever this is doing
|
|
|
Traneptora
|
2025-08-14 12:13:21
|
Is there a way for a libjxl decoder client to tell libjxl that there's no more data remaining?
|
|
2025-08-14 12:14:01
|
If you subscribe to the `BOX` and `BOX_COMPLETE` events, it always expects more boxes, so it returns `NEED_MORE_INPUT` even when there's no more boxes remaining
|
|
2025-08-14 12:14:15
|
and never returns `DEC_SUCCESS`
|
|
2025-08-14 12:17:10
|
Normally, a client that has no more data to offer to libjxl that receives `JXL_DEC_NEED_MORE_INPUT` will assume the file is truncated
|
|
2025-08-14 12:17:30
|
but that isn't the case if you subscribe to those events
|
|
|
Lilli
|
2025-08-14 01:58:31
|
BTW I managed in the end, I'm not exactly sure why now it works, but it does, memory usage went from 800+ to 550, we'll work with that
Thank you for you time and efforts, it's greatly appreciated π
|
|
|
jonnyawsom3
|
2025-08-14 02:24:14
|
Sorry we couldn't give more straightforward answers, I mostly know the CLI and parameters, while the devs who know the API are busy with jxl-rs π
|
|
|
Traneptora
|
2025-08-14 03:35:05
|
currently a bit busy but I'm trying some more memory savings with hydrium
|
|
2025-08-14 03:35:10
|
using dist clustering
|
|
|
jonnyawsom3
|
2025-08-14 04:48:15
|
Don't suppose you could add a basic quality option? I imagine it wouldn't have a large impact to memory or performance scaling the quant tables and such
|
|
|
Traneptora
|
|
Don't suppose you could add a basic quality option? I imagine it wouldn't have a large impact to memory or performance scaling the quant tables and such
|
|
2025-08-14 07:22:45
|
Wouldn't be too hard but I consider it lower priority
|
|
2025-08-14 07:23:26
|
considering adding tetrahedral 3dlut too. I don't think it is faster on modern x86 but should be without fma
|
|
|
jonnyawsom3
|
|
Traneptora
If you subscribe to the `BOX` and `BOX_COMPLETE` events, it always expects more boxes, so it returns `NEED_MORE_INPUT` even when there's no more boxes remaining
|
|
2025-08-15 02:15:42
|
Probably needs to track the listed size in the Image Header and how much has already been read
|
|
|
Traneptora
|
|
Probably needs to track the listed size in the Image Header and how much has already been read
|
|
2025-08-16 03:10:52
|
Problem is boxes (e.g..exif) can occur after the last jxlp box
|
|
2025-08-16 03:14:17
|
so codestream size doesn't matter
|
|
|
jonnyawsom3
|
2025-08-16 03:17:45
|
Right, I forgot metadata is handled in `18181-2`, I was looking at `18181-1` thinking the image header covered it
|
|
|
Traneptora
|
2025-08-25 05:14:26
|
so I'm getting an interesting issue
|
|
2025-08-25 05:15:14
|
I have a PNG which is RGB (as they tend to be) with an XYB cjpegli profile (the file was a jpegli --xyb jpeg, decoded to PNG, with the iccp preserved)
|
|
2025-08-25 05:15:36
|
I encoded this PNG to a JXL with VarDCT lossy, and it's XYB encoded
|
|
2025-08-25 05:15:58
|
so I now have a JXL file which is xyb_encoded, and also has an attached ICC profile. a strange ICC profile, specifically, which is the cjpegli xyb profile
|
|
2025-08-25 05:16:13
|
Here's the JXL file
|
|
2025-08-25 05:17:02
|
using libjxl to request the pixel data in sRGB works as expected
|
|
2025-08-25 05:17:27
|
but using libjxl to request the pixel data in the space of the attached profile, using libjxl's cms engine, aborts
|
|
2025-08-25 05:32:36
|
I do the following
|
|
2025-08-25 05:36:26
|
```c
// JXL_DEC_COLOR_ENCODING event handler
const JxlCmsInterface *cms = JxlGetDefaultCms():
JxlDecoderSetCms(decoder, *cms);
size_t icc_len;
JxlDecoderGetICCProfileSize(decoder, JXL_COLOR_PROFILE_TARGET_ORIGINAL, &icc_len);
uint8_t *icc_data = malloc(icc_len);
JxlDecoderGetColorAsICCProfile(decoder, JXL_COLOR_PROFILE_TARGET_ORIGINAL, icc_data, icc_len);
JxlDecoderSetOutputColorProfile(decoder, NULL, icc_data, icc_len);
// rest of decoder loop
```
There's two issues
|
|
2025-08-25 05:38:18
|
1. It aborts at `stage_from_linear.cc` on line 178. This is a bug.
```cpp
} else {
// This is a programming error.
JXL_DEBUG_ABORT("Invalid target encoding");
return nullptr;
}
```
I tracked it down to `dec_cache.cc` line 324, which can be patched into the following.
```diff
diff --git a/lib/jxl/dec_cache.cc b/lib/jxl/dec_cache.cc
index 7b6d54d2..cce6cede 100644
--- a/lib/jxl/dec_cache.cc
+++ b/lib/jxl/dec_cache.cc
@@ -321,8 +321,10 @@ Status PassesDecoderState::PreparePipeline(const FrameHeader& frame_header,
const size_t channels_dst =
output_encoding_info.color_encoding.Channels();
bool mixing_color_and_grey = (channels_dst != channels_src);
- if ((output_encoding_info.color_encoding_is_original) ||
- (!output_encoding_info.cms_set) || mixing_color_and_grey) {
+ if ((output_encoding_info.color_encoding_is_original &&
+ !(output_encoding_info.color_encoding.WantICC() &&
+ output_encoding_info.xyb_encoded)) || (!output_encoding_info.cms_set)
+ || mixing_color_and_grey) {
// in those cases we only need a linear stage in other cases we attempt
// to obtain a cms stage: the cases are
// - output_encoding_info.color_encoding_is_original: no cms stage
```
|
|
2025-08-25 05:38:39
|
Once I fix this bug, there's another issue:
|
|
2025-08-25 05:58:16
|
```
./lib/jxl/cms/jxl_cms.cc:793: LCMS error 13: Couldn't link the profiles
./lib/jxl/cms/jxl_cms.cc:1306: JXL_ERROR: Failed to create transform
./lib/jxl/color_encoding_internal.h:337: JXL_RETURN_IF_ERROR code=1: cms_data_ != nullptr
./lib/jxl/render_pipeline/stage_cms.cc:133: JXL_RETURN_IF_ERROR code=1: color_space_transform->Init( c_src_, output_encoding_info_.color_encoding, output_encoding_info_.desired_intensity_target, xsize_, num_threads)
./lib/jxl/render_pipeline/render_pipeline.cc:131: JXL_RETURN_IF_ERROR code=1: stage->PrepareForThreads(num)
./lib/jxl/dec_frame.h:275: JXL_RETURN_IF_ERROR code=1: dec_state_->render_pipeline->PrepareForThreads( storage_size, use_group_ids)
./lib/jxl/dec_frame.cc:700: JXL_RETURN_IF_ERROR code=1: PrepareStorage(num_threads, decoded_passes_per_ac_group_.size())
./lib/jxl/base/data_parallel.h:76: JXL_FAILURE: [DecodeGroup] failed
./lib/jxl/dec_frame.cc:725: JXL_RETURN_IF_ERROR code=1: RunOnPool(pool_, 0, ac_group_sec.size(), prepare_storage, process_group, "DecodeGroup")
./lib/jxl/decode.cc:1141: frame processing failed
```
|
|
2025-08-25 05:58:25
|
The big one is lcms couldn't link the profiles
|
|
2025-08-25 05:58:42
|
however, setting the profiles succeeded
|
|
2025-08-25 06:06:19
|
I am wondering if it is possible for libjxl to try to link the profiles when you set the output profile
|
|
2025-08-25 06:06:37
|
so if it failed you'd know immediately
|
|
|
jonnyawsom3
|
2025-08-25 06:07:25
|
Might be related, maybe not. The jpegli ICC only has the decoding information to go to RGB, the color management could be tripping up because it has no conversion to use
|
|
|
Traneptora
|
2025-08-25 06:07:49
|
well it's RGB -> RGB right? in theory
|
|
2025-08-25 06:08:05
|
I'm just looking for a profile that won't be scraped into an enum
|
|
2025-08-25 06:08:07
|
and that was one
|
|
|
spider-mario
|
2025-08-25 07:21:25
|
the jpegli ICC has XYB (βRGBβ) to PCS (profile connection spaceΒ β I donβt remember if we made that XYZD50 or CIELAB), but not PCS->XYB
|
|
2025-08-25 07:21:30
|
so it canβt be used as a destination profile
|
|
2025-08-25 07:22:15
|
an βRGB->RGBβ transform is really RGB->PCS->RGB
|
|
2025-08-25 07:24:07
|
you can try this instead
|
|
|
|
nol
|
2025-08-26 07:55:53
|
In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?
|
|
|
CrushedAsian255
|
|
nol
In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?
|
|
2025-08-27 04:53:03
|
Its probably refering to <@532010383041363969>
|
|
|
Traneptora
|
|
spider-mario
you can try this instead
|
|
2025-08-27 12:25:36
|
the problem I encountered was cjxl helpfully parsed the ICC and turned it into an enum space which I was trying to avoid for my testing, so I just added a bunch of code to hydrium. now it has a public api function `hyd_set_suggested_icc_profile(HYDEncoder *encoder, const uint8_t *icc_data, size_t icc_size)` which is nice
|
|
2025-08-27 12:26:03
|
so I spinned it up and tested it out and now the file won't decode with djxl
|
|
2025-08-27 12:26:18
|
I believe the file is valid because the other two decoders (jxlatte, jxl-oxide) both decode the file
|
|
2025-08-27 12:26:32
|
it only refuses to decode if hydrium isn't set to output one frame
|
|
2025-08-27 12:27:17
|
and now we have
|
|
2025-08-27 12:27:17
|
https://github.com/libjxl/libjxl/issues/4419
|
|
|
spider-mario
|
|
Traneptora
the problem I encountered was cjxl helpfully parsed the ICC and turned it into an enum space which I was trying to avoid for my testing, so I just added a bunch of code to hydrium. now it has a public api function `hyd_set_suggested_icc_profile(HYDEncoder *encoder, const uint8_t *icc_data, size_t icc_size)` which is nice
|
|
2025-08-27 02:06:05
|
for what itβs worth, it shouldnβt enum-ize the profile I attachedΒ β and that one should work as destination profile
|
|
|
Traneptora
|
|
spider-mario
for what itβs worth, it shouldnβt enum-ize the profile I attachedΒ β and that one should work as destination profile
|
|
2025-08-27 03:22:47
|
I can try that one then, since it appears just using a PQ profile on an XYB image has issues
|
|
|
spider-mario
so it canβt be used as a destination profile
|
|
2025-08-27 03:27:21
|
problem here is `JxlDecoderSetOutputColorProfile` doesn't return an error in this case.
|
|
2025-08-27 03:27:54
|
The documentation says it should fail
|
|
2025-08-27 03:27:55
|
> If a color management system (CMS) has been set with JxlDecoderSetCms, and the CMS supports output to the desired color encoding or ICC profile, then it will provide the output in that color encoding or ICC profile. If the desired color encoding or the ICC is not supported, then an error will be returned.
|
|
2025-08-27 03:28:06
|
but what happens is it returns successfully and then the conversion fails during the conversion step
|
|
|
spider-mario
|
2025-08-27 03:46:35
|
out of curiosity, is that also the case with libjxl built with skcms instead of lcms2?
|
|
2025-08-27 03:46:43
|
I think I remember an explicit check for that there
|
|
|
Traneptora
|
|
spider-mario
out of curiosity, is that also the case with libjxl built with skcms instead of lcms2?
|
|
2025-08-29 05:22:49
|
I don't know, I can check
|
|
|
A homosapien
|
|
nol
In some PRs (https://github.com/libjxl/libjxl/pull/2657 or https://github.com/libjxl/libjxl/pull/4236), the "jyrki31" collection of images is used for benchmarking. Is this publicly accessible and if so, can someone help me locate it?
|
|
2025-09-02 07:37:16
|
Found it https://storage.googleapis.com/artifacts.jpegxl.appspot.com/corpora/jyrki-full.tar
|
|
|
|
nol
|
2025-09-03 05:25:28
|
Awesome, thank you very much!
|
|
|
jonnyawsom3
|
2025-09-04 12:49:32
|
Continuing from https://discord.com/channels/794206087879852103/803574970180829194/1413134912567119962
|
|
2025-09-04 12:49:41
|
|
|
|
|
veluca
|
|
|
|
2025-09-04 12:55:00
|
I looked a bit at the libjxl code and it looks like memory usage might be higher if using noise
|
|
2025-09-04 12:55:10
|
(I mean, a _lot_ higher)
|
|
2025-09-04 12:55:18
|
not sure if that's what's going on here
|
|
|
jonnyawsom3
|
|
veluca
I looked a bit at the libjxl code and it looks like memory usage might be higher if using noise
|
|
2025-09-04 12:55:56
|
I think you were onto something looking at the render pipeline
```wintime -- djxl --num_threads 1 Alpha.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 19.797 MP/s, 1 threads.
PageFaultCount: 382918
PeakWorkingSetSize: 1.064 GiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 14.6 KiB
PeakPagefileUsage: 1.21 GiB
Creation time 2025/09/04 13:55:07.422
Exit time 2025/09/04 13:55:09.180
Wall time: 0 days, 00:00:01.757 (1.76 seconds)
User time: 0 days, 00:00:00.546 (0.55 seconds)
Kernel time: 0 days, 00:00:01.203 (1.20 seconds)
wintime -- djxl --num_threads 1 Opaque.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 35.496 MP/s, 1 threads.
PageFaultCount: 77411
PeakWorkingSetSize: 198.2 MiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 6.898 KiB
PeakPagefileUsage: 195.6 MiB
Creation time 2025/09/04 13:55:17.141
Exit time 2025/09/04 13:55:18.142
Wall time: 0 days, 00:00:01.001 (1.00 seconds)
User time: 0 days, 00:00:00.109 (0.11 seconds)
Kernel time: 0 days, 00:00:00.890 (0.89 seconds)```
|
|
2025-09-04 12:56:05
|
Alpha adds 5x more memory usage
|
|
|
|
veluca
|
2025-09-04 12:56:39
|
so the good news is that all of this _should_ be fixable in jxl-rs
|
|
2025-09-04 12:56:50
|
the bad news is that I need to write that code π
|
|
|
jonnyawsom3
|
2025-09-04 12:58:24
|
I have to say, I'm surprised memory usage doesn't change with thread count much, if at all. I would've assumed the data is buffered until a thread starts working on it
|
|
|
|
veluca
|
2025-09-04 12:58:48
|
it should without alpha
|
|
|
jonnyawsom3
|
2025-09-04 12:59:59
|
By less than 1MB, with Alpha by 4MB
```wintime -- djxl --num_threads 8 Alpha.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 65.846 MP/s, 8 threads.
PageFaultCount: 383747
PeakWorkingSetSize: 1.068 GiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 15.53 KiB
PeakPagefileUsage: 1.213 GiB
Creation time 2025/09/04 13:57:20.063
Exit time 2025/09/04 13:57:20.654
Wall time: 0 days, 00:00:00.590 (0.59 seconds)
User time: 0 days, 00:00:01.203 (1.20 seconds)
Kernel time: 0 days, 00:00:01.859 (1.86 seconds)
wintime -- djxl --num_threads 8 Opaque.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 176.738 MP/s, 8 threads.
PageFaultCount: 80059
PeakWorkingSetSize: 199.4 MiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 7.961 KiB
PeakPagefileUsage: 200.4 MiB
Creation time 2025/09/04 13:57:26.400
Exit time 2025/09/04 13:57:26.654
Wall time: 0 days, 00:00:00.253 (0.25 seconds)
User time: 0 days, 00:00:00.281 (0.28 seconds)
Kernel time: 0 days, 00:00:01.156 (1.16 seconds)```
|
|
|
|
veluca
|
2025-09-04 01:02:55
|
weird, but I won't complain
|
|
|
jonnyawsom3
|
2025-09-04 01:02:58
|
This also gives another reason to strip empty alpha on encode, or at least detect it and minimize decode time (g3 LZ77 only should be fast, but testing needed)
|
|
|
veluca
(I mean, a _lot_ higher)
|
|
2025-09-04 01:08:11
|
Also seeing a 30% increase with noise, but nowhere near the 5x from alpha blending
```wintime -- djxl --num_threads 1 Noise.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 25.457 MP/s, 1 threads.
PageFaultCount: 94708
PeakWorkingSetSize: 256.5 MiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 7.695 KiB
PeakPagefileUsage: 259.1 MiB
Creation time 2025/09/04 14:06:36.344
Exit time 2025/09/04 14:06:37.711
Wall time: 0 days, 00:00:01.366 (1.37 seconds)
User time: 0 days, 00:00:00.093 (0.09 seconds)
Kernel time: 0 days, 00:00:01.281 (1.28 seconds)```
|
|
|
|
veluca
|
2025-09-04 01:08:28
|
weird
|
|
|
jonnyawsom3
|
2025-09-04 01:09:18
|
Combining both does get a little hairy though
```wintime -- djxl --num_threads 1 AlphaNoise.jxl nul --output_format ppm
JPEG XL decoder v0.12.0 6efa0f5a [_AVX2_] {Clang 20.1.8}
Decoded to pixels.
7680 x 4320, 14.498 MP/s, 1 threads.
PageFaultCount: 538869
PeakWorkingSetSize: 1.658 GiB
QuotaPeakPagedPoolUsage: 36.4 KiB
QuotaPeakNonPagedPoolUsage: 20.18 KiB
PeakPagefileUsage: 1.819 GiB
Creation time 2025/09/04 14:08:57.798
Exit time 2025/09/04 14:09:00.167
Wall time: 0 days, 00:00:02.368 (2.37 seconds)
User time: 0 days, 00:00:00.671 (0.67 seconds)
Kernel time: 0 days, 00:00:01.687 (1.69 seconds)```
|
|
2025-09-04 01:10:17
|
Around 18x more memory than the image itself
|
|
2025-09-04 01:23:02
|
Oxide has it's own quirks, we've had certain files use 4GB of memory for a 4K image IIRC
|
|
|
Quackdoc
|
2025-09-05 01:06:56
|
<@179701849576833024> I fell asleep :D, I printed out `num_buffers` and twice it prints `480`
```
./tools/djxl --num_threads 1 mona-jxl.jxl --disable_output
JPEG XL decoder v0.12.0 9f29783e [_AVX2_,SSE4,SSE2] {GNU 15.2.1}
Number of buffers: 480
Number of buffers: 480
Decoded to pixels.
7432 x 3877, 18.364 MP/s, 1 threads.
```
|
|
|
|
veluca
|
2025-09-05 01:07:19
|
that explains it
|
|
2025-09-05 01:07:21
|
noise?
|
|
|
Quackdoc
|
2025-09-05 01:09:45
|
not entirely sure, it's been a while since I made the image, any way I can easily check?
|
|
2025-09-05 01:10:10
|
```
./tools/jxlinfo -v mona-jxl.jxl
JPEG XL image, 7432x3877, lossy, 8-bit RGB+Alpha
Number of color channels: 3
Number of extra channels: 1
Extra channel 0:
type: Alpha
bits per sample: 8
alpha premultiplied: 0 (Non-premultiplied)
Have preview: 0
Have animation: 0
Intrinsic dimensions: 7432x3877
Orientation: 1 (Normal)
Color space: RGB
White point: D65
Primaries: sRGB
Transfer function: sRGB
Rendering intent: Relative
jxl-oxide info -v mona-jxl.jxl
2025-09-05T01:10:01.624293Z DEBUG jxl_render: Setting default output color encoding default_color_encoding=ColorEncodingWithProfile { encoding: Enum(EnumColourEncoding { colour_space: Rgb, white_point: D65, primaries: Srgb, tf: Srgb, rendering_intent: Relative }), icc_profile: (0 byte(s)), is_cmyk: false }
JPEG XL image (BareCodestream)
Image dimension: 7432x3877
Bit depth: 8 bits
XYB encoded, suggested display color encoding:
Colorspace: RGB
White point: D65
Primaries: sRGB
Transfer function: sRGB
Extra channel info:
#0 Alpha
Frame #0 (keyframe)
VarDCT (lossy)
Frame type: Regular
7432x3877; (0, 0
```
|
|
|
|
veluca
|
2025-09-05 01:12:44
|
ah, wait, it has alpha
|
|
2025-09-05 01:12:58
|
if that lossy alpha that should explain it
|
|
2025-09-05 01:13:18
|
and yes jxl-rs will do better π
|
|
|
Quackdoc
|
2025-09-05 02:52:51
|
yay :D
|
|
|
jonnyawsom3
|
2025-09-05 03:06:46
|
Wasn't jxl-rs also using half instead of float internally? That should knock out a lot of overhead for 8bit images
|
|
|
|
veluca
|
2025-09-05 07:07:37
|
Maybe
|
|
|
Quackdoc
|
2025-09-05 07:08:38
|
right now thumbnailing some of my folders that have a good number of these images is rough and will oom my PC lol
|
|
|
AccessViolation_
|
2025-09-05 09:30:24
|
does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0
|
|
|
RaveSteel
|
|
AccessViolation_
does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0
|
|
2025-09-05 09:33:42
|
Source for the image?
|
|
|
AccessViolation_
|
|
RaveSteel
Source for the image?
|
|
2025-09-05 09:34:30
|
https://redlib.privadency.com/r/Silksong/comments/1n82yik/one_last_fanart_before_its_real_rework_of_a/
|
|
2025-09-05 09:37:21
|
the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from
|
|
|
jonnyawsom3
|
|
AccessViolation_
the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from
|
|
2025-09-05 09:38:43
|
God bless the EXIF fix
|
|
2025-09-05 09:39:23
|
Trailing data I'd assume?
|
|
|
RaveSteel
|
|
AccessViolation_
the full images from discord's and reddit's cdn are bit identical so it shouldn't matter where you get it from
|
|
2025-09-05 09:39:30
|
I was just asking to find the artist xd
|
|
|
AccessViolation_
|
2025-09-05 09:39:49
|
I figured :p
|
|
2025-09-05 09:39:52
|
it's nice art
|
|
|
Trailing data I'd assume?
|
|
2025-09-05 09:40:55
|
hmm you think?
|
|
2025-09-05 09:44:18
|
could this be the trailing data in question
|
|
2025-09-05 09:46:16
|
yep seems like anything beyond that point isn't classified as anything part of the PNG format
|
|
|
RaveSteel
|
2025-09-05 10:28:51
|
I wonder where that trailing data comes from
|
|
|
jonnyawsom3
|
2025-09-06 02:23:07
|
https://github.com/libjxl/jxl-rs/releases/tag/v0.1.0
|
|
|
|
veluca
|
2025-09-06 06:13:23
|
don't read too much into it, I just wanted to grab the crates.io crate name before it was too late π
|
|
|
Quackdoc
|
2025-09-06 08:13:53
|
did they stop allowing name squatting?
|
|
|
|
veluca
|
2025-09-06 08:55:54
|
yup
|
|
2025-09-06 08:56:14
|
I had to send a request to take down the previous namesquat already
|
|
|
jonnyawsom3
|
2025-09-06 09:02:09
|
Hey, everything is progress
|
|
|
HCrikki
|
2025-09-06 01:56:16
|
whats the conformity level for jxl-rs 0.1.0 ?
could someone keep https://libjxl.github.io/bench/ updated ?
|
|
|
Meow
|
2025-09-07 03:36:41
|
Homebrew pushes the newer version!
`jpeg-xl 0.11.1_2 -> 0.11.1_3`
|
|
|
AccessViolation_
|
|
veluca
don't read too much into it, I just wanted to grab the crates.io crate name before it was too late π
|
|
2025-09-08 08:57:03
|
you can't stop me!
HEAR YE JXL RS IS COMINGG
|
|
|
Meow
|
2025-09-08 09:03:07
|
It already came
|
|
2025-09-09 07:40:31
|
https://github.com/libjxl/jxl-rs/releases/tag/v0.1.1
|
|
|
Amiralgaby π
|
|
AccessViolation_
does anyone else have issues encoding this image? I get `Getting pixel data failed`. a different PNG worked. I'm on 0.11.0
|
|
2025-09-12 11:48:27
|
I had the same error.
Cause Not implemented = error data pixel instead of not implemented error or warning
|
|
|
jonnyawsom3
|
2025-09-15 01:08:04
|
<@263300458888691714> back when you improved patch detection, was that by tweaking values in the exisiting heuristic, or did you make something entirely new for it? Wondering if we could try and replicate it
|
|
|
_wb_
|
2025-09-15 01:32:54
|
the current heuristic starts from the assumption of a solid background, but that's just covering a specific type of patches (letter-like stuff)
|
|
2025-09-15 01:37:45
|
for something like arbitrary repetitive stuff (say an oldschool game screenshot where most stuff is composed out of repeating sprites), the current heuristic doesn't work at all
|
|
2025-09-15 01:39:32
|
some intra block copy algorithm from a video codec should be added to detect such kinds of repetition.
|
|
|
monad
|
|
<@263300458888691714> back when you improved patch detection, was that by tweaking values in the exisiting heuristic, or did you make something entirely new for it? Wondering if we could try and replicate it
|
|
2025-09-16 01:29:30
|
The first results were the consequence of relaxing existing constraints (particularly around the background requirement), then of mixing some regular grid scanning based on detected patch size, then of implementing a dedicated algorithm assuming a grid. My strategy was naive and its results only intended to inspire.
Besides efficiently finding patch candidates, one should judge which features would practically benefit density. A singular bright pixel may be the most important feature to extract, while cutting out large sections of a texture or deduplicating photo content may be harmful. Even the existing algorithm is dubious or damaging outside strict text content.
|
|
|
Demiurge
|
2025-09-16 05:17:25
|
I kinda dislike how the jxl tools use so many threads by default. Especially with SMT cores, it uses way more threads than actual physical cores on your CPU, by default. Which is wasteful.
|
|
2025-09-16 05:18:16
|
And even if it were smart enough to count physical cores instead of logical cores, the scaling is still very sub-linear.
|
|
|
jonnyawsom3
|
|
Demiurge
And even if it were smart enough to count physical cores instead of logical cores, the scaling is still very sub-linear.
|
|
2025-09-16 05:42:18
|
1 Thread
```3840 x 2160, geomean: 0.744 MP/s [0.731, 0.751], 5 reps, 0 threads.
Wall time: 0 days, 00:00:55.997 (56.00 seconds)
User time: 0 days, 00:00:00.937 (0.94 seconds)
Kernel time: 0 days, 00:00:54.984 (54.98 seconds)```
2 Threads
```3840 x 2160, geomean: 1.456 MP/s [1.434, 1.457], 5 reps, 2 threads.
Wall time: 0 days, 00:00:28.724 (28.72 seconds)
User time: 0 days, 00:00:01.078 (1.08 seconds)
Kernel time: 0 days, 00:00:55.093 (55.09 seconds)```
4 Threads
```3840 x 2160, geomean: 2.719 MP/s [2.653, 2.774], 5 reps, 4 threads.
Wall time: 0 days, 00:00:15.472 (15.47 seconds)
User time: 0 days, 00:00:01.218 (1.22 seconds)
Kernel time: 0 days, 00:00:56.984 (56.98 seconds)```
Seems pretty linear to me
|
|
|
Demiurge
|
2025-09-16 06:25:24
|
a:
1 thread: 82.276 MP/s
2 threads: 158.567 MP/s
4 threads: 291.051 MP/s
b:
2 threads: 207.995 MP/s
4 threads: 388.176 MP/s
|
|
2025-09-16 06:25:42
|
(These are with an unusually-massive image)
|
|
|
diskorduser
|
2025-09-16 06:26:56
|
Sees Okay-ish to me.
|
|
|
Demiurge
|
2025-09-16 06:32:55
|
going from 9 to 10 threads adds about 13 MP/s; going from 8 to 9 adds 40 MP/s.
going from 1 to 2 adds 100 MP/s.
|
|
2025-09-16 06:33:25
|
of a difference
|
|
2025-09-16 06:33:43
|
I think "use as many threads as logical cores" is a bad default
|
|
2025-09-16 06:34:13
|
Like even if that's your intention you are better off counting physical cores instead of logical cores.
|
|
2025-09-16 06:34:24
|
Since beyond that you get zero speedup
|
|
2025-09-16 06:36:28
|
And you're just reducing efficiency after that and spamming the CPU scheduler
|
|
|
jonnyawsom3
|
|
Demiurge
Since beyond that you get zero speedup
|
|
2025-09-16 06:52:58
|
On an 8 core CPU
8 Threads
```3840 x 2160, geomean: 5.715 MP/s [5.512, 5.769], 5 reps, 8 threads.
Wall time: 0 days, 00:00:07.450 (7.45 seconds)
User time: 0 days, 00:00:01.906 (1.91 seconds)
Kernel time: 0 days, 00:00:50.921 (50.92 seconds)```
16 Threads
```3840 x 2160, geomean: 7.473 MP/s [7.122, 7.626], 5 reps, 16 threads.
Wall time: 0 days, 00:00:05.743 (5.74 seconds)
User time: 0 days, 00:00:06.093 (6.09 seconds)
Kernel time: 0 days, 00:01:05.437 (65.44 seconds)```
30% faster is not zero. Considering SMT is usually around 40-50% the speed of real cores, and my system in in use, that's still pretty linear
|
|
|
Demiurge
|
2025-09-16 06:59:00
|
If you disable SMT in the bios, you will probably get even better results...
|
|
2025-09-16 06:59:32
|
On my system, with SMT enabled, and 12 physical cores, I get a small improvement with 13 threads and zero improvement at 14
|
|
2025-09-16 07:04:06
|
I do not think there is any reason to use so many threads as the default setting honestly...
|
|
2025-09-16 07:04:51
|
Matching the number of physical cores would make more logical sense :)
|
|
|
jonnyawsom3
|
|
Demiurge
On my system, with SMT enabled, and 12 physical cores, I get a small improvement with 13 threads and zero improvement at 14
|
|
2025-09-16 07:07:37
|
What resolution and encode settings?
|
|
|
Demiurge
|
2025-09-16 07:43:39
|
I'm decoding, not encoding
|
|
2025-09-16 07:44:06
|
And it's an unusually-large image but I have 128G RAM
|
|
|
jonnyawsom3
|
2025-09-16 07:50:36
|
Huh, you're right, djxl does hit a roadblock with a single hyperthread
|
|
2025-09-16 07:57:37
|
Moving to <#803645746661425173>
|
|
|
Demiurge
|
2025-09-16 08:02:56
|
This is a physical feature, not a software feature. It's the nature and design of SMT
|
|
2025-09-16 08:03:20
|
You are limited by the number of physical cores.
|
|
|
Kupitman
|
|
Demiurge
If you disable SMT in the bios, you will probably get even better results...
|
|
2025-09-16 09:12:03
|
no, OS always trying use physical cores, and get more perfomance from SMT
|
|
|
Demiurge
|
2025-09-16 09:48:54
|
It's well known that disabling SMT improves performance. SMT is not designed to improve performance. It's designed to sacrifice performance but improve core scheduling in return.
|
|
2025-09-16 09:49:43
|
It's a feature that makes sense for servers and not desktops.
|
|
2025-09-16 09:50:52
|
Some servers even have 8 logical cores per physical core.
|
|
2025-09-16 09:51:26
|
But it's not able to do 8 times the work... It's still limited by the 1 physical core.
|
|
2025-09-16 09:51:59
|
It's just able to split the work more efficiently.
|
|
2025-09-16 09:52:16
|
With less overhead of context switching
|
|
|
|
ignaloidas
|
2025-09-16 11:05:00
|
Not really? SMT is meant to improve core resource utilization, with very few changes to the core itself once you have a proper OoO core (just one extra instruction decoder and another set of registers)
|
|
2025-09-16 11:06:34
|
of course some workloads can utilize *all* of the cores resources, and with them you're not going to see the improvement, but most workloads that aren't "a whole bunch of tight math with little branching" do end up being faster
|
|
2025-09-16 11:07:00
|
any advantage from more instruction decoders leading to less context switching is mostly incidental
|
|
|
A homosapien
|
|
Demiurge
It's well known that disabling SMT improves performance. SMT is not designed to improve performance. It's designed to sacrifice performance but improve core scheduling in return.
|
|
2025-09-16 11:14:53
|
That's a very bold claim, do you have a source for that?
|
|
|
Demiurge
|
2025-09-16 11:22:02
|
I take it for granted based on my own personal experience. My experience led me to read more about it to see if other people have had similar results to my own. And from what I can gather, there seems to be a consensus that disabling SMT makes sense for desktop machines and increases per-core performance.
|
|
2025-09-16 11:22:45
|
which does not seem very outrageous or surprising to me.
|
|