JPEG XL

Info

rules 57
github 35276
reddit 647

JPEG XL

tools 4225
website 1655
adoption 20712
image-compression-forum 0

General chat

welcome 3810
introduce-yourself 291
color 1414
photography 3435
other-codecs 23765
on-topic 24923
off-topic 22701

Voice Channels

General 2147

Archived

bot-spam 4380

jxl

Anything JPEG XL related

_wb_
2022-05-11 09:36:18
Gamut mapping is just clamping when requesting uint, and returning out of range values when requesting float, iiuc
2022-05-11 09:37:41
Tone mapping is done according to the tone mapping header iiuc, but <@604964375924834314> probably knows better
spider-mario
2022-05-11 09:46:39
there is code for tone mapping in the codebase but at present, it’s not really well-integrated with the rest
2022-05-11 09:46:45
djxl can use it but djxl_ng not yet
2022-05-11 09:46:52
libjxl doesn’t either
Traneptora
2022-05-12 02:39:44
can libjxl output any enum space including `JXL_PRIMARIES_CUSTOM` or is that an exception?
veluca
2022-05-12 02:54:00
at least in theory it should, yes
2022-05-12 02:54:08
(if XYB)
Traneptora
2022-05-12 03:02:45
I noticed the libjxl docs say this
2022-05-12 03:02:48
> The JPEG XL image has an attached ICC Profile, in that case, the encoded structured data is not available, this function will return an error status. JxlDecoderGetColorAsICCProfile should be called instead.
2022-05-12 03:02:55
Is this only the case for `TARGET_ORIGINAL`?
2022-05-12 03:04:14
I don't see why this would fail if you have an XYB encoded space and you call `TARGET_DATA`. I would think it would just return the color encoding you requested with `JxlDecoderSetPreferredColorEncoding`
veluca at least in theory it should, yes
2022-05-12 03:06:01
Does this mean that if I can parse the ICC Profile for `TARGET_ORIGINAL`, and determine the primaries and white point from it, then in theory I could request the pixel data in the space described by that ICC Profile, using `JXL_PRIMARIES_CUSTOM` and then attach that original ICC Profile to that pixel data and have it render properly?
veluca
Traneptora I don't see why this would fail if you have an XYB encoded space and you call `TARGET_DATA`. I would think it would just return the color encoding you requested with `JxlDecoderSetPreferredColorEncoding`
2022-05-12 03:07:16
I think this might be true, but best if you try ...
Traneptora Does this mean that if I can parse the ICC Profile for `TARGET_ORIGINAL`, and determine the primaries and white point from it, then in theory I could request the pixel data in the space described by that ICC Profile, using `JXL_PRIMARIES_CUSTOM` and then attach that original ICC Profile to that pixel data and have it render properly?
2022-05-12 03:08:11
In principle yes, although of course not all color profiles are representable via enum :) (also you also would need to handle the TF too)
Traneptora
2022-05-12 03:09:07
wouldn't any white point and primaries be representable by the enums?
2022-05-12 03:09:14
what else is in the ICCP other than TRC?
2022-05-12 03:11:18
to be honest, what I'm trying to do here is request pixel data from the decoder in the `TARGET_ORIGINAL` space, and attach the `TARGET_ORIGINAL` iccp, which is not necessarily possible without a CMS
2022-05-12 03:11:36
but if I have a full version of libjxl and not just libjxl_dec then it will have a CMS included
2022-05-12 03:11:56
so I don't see why the decoder of the full version should not have this functionality
2022-05-12 03:12:03
is that a post-1.0 goal or a pre-1.0 goal?
2022-05-12 03:13:46
i.e. the full ersion of libjxl should export `JxlDecoderSetOriginalProfile` which sets the decoded pixel data to the data which will be accurately represented by `TARGET_ORIGINAL`'s ICC profile
2022-05-12 03:13:53
or something
veluca
Traneptora what else is in the ICCP other than TRC?
2022-05-12 03:45:27
there could be a lot of things actually, i.e. LUT-based profiles... many (most?) ICCs are WP + PRI + TRC, but not all
2022-05-12 03:46:06
some ICCs don't even *have* primaries or transfer curves
2022-05-12 03:46:46
I think if we only had wp/pri/trc ICCs we'd just have implemented parsing in libjxl, but the world doesn't look that way so...
_wb_
2022-05-12 03:50:27
CMYK ICC profiles can be a megabyte big and god knows what they contain but it's certainly not just some primaries and transfer curve
veluca
2022-05-12 03:55:27
4D LUTs 😉
_wb_
2022-05-12 03:58:26
I guess. After all they try to model stuff like how paper absorbs ink and what chemically happens when inks are mixed, which is a lot more hairy stuff than modeling idealized light
Traneptora
Traneptora to be honest, what I'm trying to do here is request pixel data from the decoder in the `TARGET_ORIGINAL` space, and attach the `TARGET_ORIGINAL` iccp, which is not necessarily possible without a CMS
2022-05-12 07:20:29
either way, is this something that would happen before or after 1.0?
_wb_
2022-05-12 07:25:25
I hope before
2022-05-12 07:26:14
https://github.com/libjxl/libjxl/pull/1366
2022-05-12 07:28:26
Implementing this new api should be mostly a matter of extending the final render pipeline stage to call cms on the floats just after interleaving and just before converting to the pixelformat. Shouldn't be terribly hard to do I suppose.
2022-05-12 07:30:14
And I assume it makes sense to have a default cms just like in the encoder (except in libjxl_dec where you wouldn't have that, so you'd get error when trying to set output space without setting cms first)
Traneptora
2022-05-12 07:39:50
oh, interesting, there's already a PR for this
2022-05-12 07:40:53
oh, it's just for the API
2022-05-12 07:41:18
If this implementation happens, would you also add SetPreferredOutputICCProfile?
2022-05-12 07:41:36
so you can pass it an ICC profile, and the pixel data received will be accurately described by that ICC Profile?
2022-05-12 07:42:06
either an ICC Profile for a monitor, or the TARGET_ORIGINAL profile if you're just passing it through?
_wb_
2022-05-12 07:53:35
So only preferred, not desired? Only transforming in case of xyb?
2022-05-12 07:54:45
You could do that by inspecting uses_original_profile and only setting desired profile if it's false, I guess.
Traneptora
2022-05-12 08:28:51
how would you set the desired profile, though?
2022-05-12 08:29:24
the API proposal changes don't have a function to set a preferred/desired ICC Profile, only desired output JxlColorEncoding
_wb_
2022-05-12 08:56:50
No it has both
2022-05-12 08:57:27
JxlDecoderSetICCProfile
Traneptora
2022-05-12 09:01:48
I didn't see it <:DunktsukiStare:759697530786938901>
2022-05-12 09:02:47
oh, it's after all the comments
2022-05-12 09:02:49
*nevermind*
2022-05-12 09:03:20
I assume it would fail if `uses_original_profile == true`?
2022-05-12 09:03:41
or will that always work?
_wb_
2022-05-12 09:05:22
Will always work, if cms is set (or default cms is available, I guess)
yurume
2022-05-12 09:06:26
DecodeHybridVarLenUint seems to be one of the things in jxl that I have to be really careful about overflow since it's essentially unbounded
_wb_
2022-05-12 09:08:44
Should fit in uint32 in practice I think. At least that's how we implement it iirc
yurume
2022-05-12 09:09:25
I recall that libjxl did have some sort of bounds in the hybrid int decoder, but I'm not sure if those bounds are for trees or for individual values
2022-05-12 09:11:04
(without looking at the actual code) my guess is the latter, as I've seen some bitstream where the LZ77 token starts at a very large number while only a few small values are actually used for literals
veluca
2022-05-12 09:11:17
libjxl `and`s every token's nbits with 31
2022-05-12 09:11:49
that's the only real bounds check there
2022-05-12 09:12:24
(which is equivalent to saying that everything that doesn't fit in u32 makes things go... weird)
yurume
2022-05-12 09:12:28
a valid bitstream but not properly supported by libjxl, so to say?
veluca
2022-05-12 09:13:02
I'd argue that if it's valid it's because we forgot to put the limit in the profiles and levels section xD
yurume
2022-05-12 09:15:26
jxl bitstream is very dense so I'm not sure that &31 is a good solution
2022-05-12 09:16:01
since it still leaves a possibility for an invalid bitstream to be decoded as valid
_wb_
2022-05-12 09:16:24
We do say that int32 has to be enough for the buffers, maybe we should add that uint32 has to be enough for the PackSigned residuals
veluca
2022-05-12 09:16:38
a very unlikely possibility, but yes
2022-05-12 09:17:00
given the spec doesn't say what happens to invalid bitstreams I wouldn't mind too much xD
yurume
2022-05-12 09:17:08
haha
2022-05-12 09:17:43
but yeah, my journey to the jxl spec (and jxsml) gave a feeling that the format is very unforgiving
veluca
2022-05-12 09:18:09
(also note that at least with ANS you have a builtin checksum, so it's not very *likely* that you'll end up decoding successfully)
yurume
2022-05-12 09:18:18
ah that is a valid point
2022-05-12 09:19:09
I'm yet to reach the point in the spec where ANS is frequently used (I did implement but couldn't test it for that reason)
veluca
2022-05-12 09:20:23
ICC is the first place to test it
2022-05-12 09:20:54
I give a 50% chance at best that all the details of histogram reading etc are actually correct in the spec xD
yurume
2022-05-12 09:21:42
I've already gone through ICC (didn't do the actual decode, but enough code to fully parse them and move on, which means a full entropy code impl)
veluca
2022-05-12 09:21:56
oh, very cool 😄
yurume
2022-05-12 09:21:58
and I think my test image didn't have ANS codes for ICC?
2022-05-12 09:22:11
maybe it is only useful for larger ICC profiles
veluca
2022-05-12 09:22:24
nah, trees are not for *all* ANS streams
2022-05-12 09:22:27
just for modular images
2022-05-12 09:22:40
i.e. frames with kModular, and DC in VarDCT
yurume
2022-05-12 09:22:53
ah, my bad, I meant codes not trees (which are not used in ANS itself)
veluca
2022-05-12 09:22:56
ah, could have been a huffman
yurume
2022-05-12 09:23:00
yeah
veluca
2022-05-12 09:23:14
in fact it probably is most of the time
yurume
2022-05-12 09:23:32
I'm also missing a nested entropy code impl (which should be not too hard at this point, but I lack a test image)
veluca
2022-05-12 09:23:34
not too hard to convince cjxl to use ANS anyway
2022-05-12 09:23:39
nested?
yurume
2022-05-12 09:24:18
using yet another entropy code for decoding cluster maps
2022-05-12 09:24:39
or context maps (I think that was a terminology used by libjxl)
veluca
2022-05-12 09:25:54
ah yeah
2022-05-12 09:26:06
not sure if you can actually trigger that for ICC
2022-05-12 09:27:50
mhhh if you find an image with a huge ICC (or attach such an ICC to it with imagemagick), you should be able to
2022-05-12 09:28:22
another interesting point: https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_icc_codec.cc#L400
2022-05-12 09:28:39
if you remove that line, and make the ICC big enough, it *probably* will use ANS
yurume
2022-05-12 09:28:51
is there any particular reasoning for that line?
veluca
2022-05-12 09:29:14
yeah, Huffman can handle larger alphabets
2022-05-12 09:29:21
and for ICC that's often useful
yurume
2022-05-12 09:29:37
ah that kind of things
2022-05-12 09:29:59
and I'm also noticing `LZ77Method::kOptimal` for small enough ICC profiles, heh
veluca
2022-05-12 09:30:06
yeah
2022-05-12 09:30:12
that LZ77 encoder is *slow*
yurume
2022-05-12 09:30:31
rightly so for optimal parse
veluca
2022-05-12 09:30:44
I mean, way slower than it needs to be 😛
yurume
2022-05-12 09:31:03
well at least you only have exactly one chunk of compressed data unlike, say, DEFLATE :p
veluca
2022-05-12 09:31:54
ah, the format is fast enough
2022-05-12 09:32:03
just not enough effort in the enc impl
yurume
2022-05-12 09:32:37
for the format, did you mean DEFLATE? (was a bit confused for a moment)
veluca
2022-05-12 09:33:49
I mean JXL's LZ77 bitstream
yurume
2022-05-12 09:34:02
ah
Traneptora
2022-05-12 10:28:19
why dont' we just use zlib?
2022-05-12 10:28:22
for that
veluca
2022-05-12 10:56:52
in jxl?
2022-05-12 10:57:10
well, because (a) no context modeling (b) short lengths (c) huffman
Traneptora
2022-05-12 11:00:28
no I mean why not use zlib for lz77
2022-05-12 11:00:40
since iirc it can do that
veluca
2022-05-12 11:06:01
it can do it *for deflate*
2022-05-12 11:06:51
and even then, it doesn't even come close to `LZ77Method::kOptimal` in terms of compression (which is closer to zopfli, both in speed and in implementation 😛)
Traneptora
2022-05-12 11:07:30
ah
2022-05-13 04:17:27
could someone add the Documentation label to https://github.com/libjxl/libjxl/pull/1421
veluca
2022-05-13 07:14:49
done!
JendaLinda
2022-05-13 07:53:26
If RGBA picture is compressed using VarDCT, is the alpha channel compressed losslessly like in webp, or is VarDCT used for the alpha channel as well? Or are both options possible?
_wb_
2022-05-13 07:56:32
VarDCT can only operate on 3 channels
2022-05-13 07:56:43
So alpha is always encoded with modular
2022-05-13 07:56:53
It can be lossy or lossless
2022-05-13 07:57:43
Slightly lossy alpha is done by default when doing a lossy encode
2022-05-13 07:59:54
The artifacts of lossy modular are different from those of lossy VarDCT: ringing shouldn't happen (which could be weird in an alpha channel), though you can get some blur and jaggies.
JendaLinda
2022-05-13 08:01:03
Is there an option in cjxl to force lossless alpha?
2022-05-13 08:04:12
Also, what would be the benefit of using lossy Modular in general instead of VarDCT? Is it good for non photographic content?
_wb_
2022-05-13 08:04:42
Perhaps sometimes it is.
2022-05-13 08:05:50
I don't think there's currently an option in cjxl to set alpha quality independently from color quality
2022-05-13 08:06:23
Would it be useful?
2022-05-13 08:08:24
We don't have it in the encode api either atm. Wouldn't be very hard to do though, could even have distance settings for each extra channel separately.
JendaLinda
2022-05-13 08:10:38
I think lossy alpha might be undesirable in some cases. It would be useful to add the options. Also it seems that VarDCT and Modular use a different quality scale.
2022-05-13 08:20:36
At least -d option didn't have any effect on Modular. I know I can use -q for both modes, but there I don't know what -q value is equivalent to "near lossless" -d 1 and what is the "near lossless" value for Modular (perhaps used for the lossy alpha too?).
_wb_
2022-05-13 08:22:49
In the current git version, -d and -q are just the same thing, and it influences both modular and vardct
2022-05-13 08:23:02
-d 1 is -q 90
JendaLinda
2022-05-13 08:31:36
Alright, 90 seems to be a good mapping value. Now I can do some comparisons.
veluca
_wb_ I don't think there's currently an option in cjxl to set alpha quality independently from color quality
2022-05-13 08:34:32
doesn't -Q (sort of by mistake) do that? or did we remove it?
_wb_
2022-05-13 08:35:22
We removed -Q
veluca
2022-05-13 08:36:16
ah, then we need an option for alpha quality indeed
_wb_
2022-05-13 08:44:53
Yes, it would also be useful for other extra channels, e.g. kBlack you also probably want to control separately, etc.
lonjil
2022-05-14 11:13:49
How good is modular mode at compressing photographic content lossily?
veluca
2022-05-14 12:20:38
not as good as vardct, but I can't tell you much more than that
Orum
2022-05-14 12:33:25
modular is good for some things, but I would not put photographs in that category
_wb_
2022-05-14 12:53:25
Better than flif but not as good as vardct :)
Traneptora
2022-05-14 02:22:23
well I suppose any lossy codec is better than flif at being lossy <:kek:857018203640561677>
2022-05-14 02:23:10
<@179701849576833024> just to be clear, did you want bulleted lists to have *more* indentation?
2022-05-14 02:25:39
and how much did you want?
2022-05-14 02:28:18
I ask because I modeled it after JxlDecoderGetColorAsEncodedProfile
2022-05-14 02:28:24
the whitespace, that is
2022-05-14 02:34:25
is this ideal?
2022-05-14 02:34:43
(yes it's a screenshot of code but I don't know how discord will wrap lines)
2022-05-14 02:35:02
veluca
2022-05-14 03:40:12
this seems ideal
2022-05-14 03:40:20
thanks 🙂
_wb_
2022-05-14 04:36:27
Flif is not _that_ horrible as a lossy codec: https://flif.info/lossy.html
2022-05-14 04:38:45
I mean, ignoring speed
2022-05-14 04:39:25
Probably not better than jpeg, but not horribly worse either
Traneptora
2022-05-14 04:54:59
Isn't flif lossless
2022-05-14 04:55:11
or is this proprocessed, like lossy png
_wb_
2022-05-14 05:00:37
Yes
2022-05-14 05:00:53
More effective than lossy png
2022-05-14 05:01:03
But same idea
Traneptora
2022-05-14 06:06:41
<@179701849576833024> sent an updated PR
2022-05-14 06:06:48
meaning I force-pushed
veluca
2022-05-14 06:12:27
and I approved 🙂
Traneptora
2022-05-14 06:49:42
I marked the two outdated conversations as resolved
veluca
2022-05-14 06:58:28
sure
2022-05-14 06:58:58
merged
2022-05-14 06:59:03
I thought I had automerge enabled...
Traneptora
veluca I thought I had automerge enabled...
2022-05-15 12:42:46
it disabled when I force-pushed
yurume
2022-05-15 10:06:09
yay yet another spec bug
veluca
2022-05-15 10:06:48
lovely
2022-05-15 10:06:54
well, great that you found it 🙂
yurume
2022-05-15 10:07:23
well I've filed 20 spec issues so far, so finding another one is not a big deal 😉
2022-05-15 10:08:06
but some spec bugs are particularly annoying to track down, this time alias table lookup is completely broken
2022-05-15 10:08:42
(the spec says `offset = offsets[i] + pos;`, which should be a conditional move just like symbols)
2022-05-15 10:09:32
I still have at least one outstanding bug I haven't yet tracked down, hopefully that's the last one before I can actually decode a whole ANS stream
veluca
yurume I still have at least one outstanding bug I haven't yet tracked down, hopefully that's the last one before I can actually decode a whole ANS stream
2022-05-15 10:10:53
the spec says that?
2022-05-15 10:10:57
I mean, I can see it does
yurume
2022-05-15 10:10:58
yes
veluca
2022-05-15 10:11:12
but how did I manage to write it that way? xD
yurume
2022-05-15 10:11:29
sleep and/or caffeine deprivation? lol
2022-05-15 10:11:50
spec doesn't have any automated test, so...
veluca
2022-05-15 10:12:00
more a question of too many changes over time I think 😛
2022-05-15 10:12:27
ah, the C++ code does say `s.offset = offsets1_or_0 + pos`
2022-05-15 10:12:33
so I can see how that would happen
2022-05-15 10:12:38
just... yeah
yurume
2022-05-15 10:13:53
well there are also just typos, like the alias table _initialization_ code where i and u and o are randomly swapped
veluca
2022-05-15 10:14:10
🤦‍♂️
2022-05-15 10:14:29
and here I thought I copied that from the C++ code
2022-05-15 10:15:03
AFAIU it should be `offset = (pos >= cutoffs[i]) ? offsets[i] + pos : pos` or something like that
yurume
2022-05-15 10:15:14
I think the original standard had them in prose (what I'm working with now is not one but the current community draft, as had been shared recently)
veluca
2022-05-15 10:15:20
yep
yurume
2022-05-15 10:15:27
so the conversion may have corrupted that
veluca
2022-05-15 10:15:36
I remember spending a lot of time writing it in prose, and ISO basically saying "nah, code"
2022-05-15 10:15:45
then I copypasted from C++
yurume
2022-05-15 10:15:47
;_;
improver
2022-05-15 10:15:48
epic
veluca
2022-05-15 10:15:51
but apparently wrong
improver
2022-05-15 10:16:19
i mean kinda yeah code can be less weird at some details but lol
_wb_
2022-05-16 06:04:50
Yurume is single-handedly finding more spec bugs than all the national bodies of ISO combined 😅
Petr
2022-05-16 06:08:03
Honestly, are the national bodies really interested in finding bugs? 🙂
yurume
_wb_ Yurume is single-handedly finding more spec bugs than all the national bodies of ISO combined 😅
2022-05-16 06:12:37
I'm kinda surprised too, didn't other ISO members proofread the document and give feedback? maybe they didn't take a look at libjxl though...
_wb_
2022-05-16 06:37:42
They read it but they didn't try to execute it, that kind of makes a big difference
JendaLinda
_wb_ VarDCT can only operate on 3 channels
2022-05-16 08:19:16
Just for clarification, VarDCT always uses 3 channels for color images, but how does it handles grayscale images? Does it use a single channel?
veluca
2022-05-16 09:02:19
it uses 3 channels, two of which are 0
2022-05-16 09:07:12
(it took some pain to ensure they are *actually* 0)
_wb_
2022-05-16 09:07:43
A channel that is all-zeroes compresses to some small constant size regardless of dimensions, in both vardct and in modular
2022-05-16 09:08:16
So we didn't really have to bother with not encoding zero chroma, it would make very little difference
2022-05-16 09:09:33
(it's different in all other codecs I know, which have some nonzero fixed cost per sample so having a special case for luma-only does make sense for them)
2022-05-16 09:10:35
In modular mode, grayscale is encoded with just a single channel, but that's because modular is flexible in the channelcount anyway
veluca
2022-05-16 09:10:39
tbh it would make sense for us too, for memory usage 😉
_wb_
2022-05-16 09:11:23
Well we could special-case a chroma histogram that is a singleton and not actually allocate it in that case
JendaLinda
2022-05-16 09:42:35
Okay. I suppose the same thing happens when transcoding grayscale JPEG to VarDCT, such JPEG has only the luma channel. In modular, I guess, indexed color could be used for grayscale as well. After all, 8 bit grayscale uses small amount of "colors".
_wb_
2022-05-16 09:49:32
yes, probably if you pass it a grayscale image as RGB, the palette transform will make a palette and encode it like that
2022-05-16 09:50:13
and otherwise YCoCg will turn it into one grayscale channel and two all-zero channels
JendaLinda
2022-05-16 10:26:52
The application can pretty easily detect a grayscale image by comparing the RGB values. After all, some image formats don't support grayscale directly and store it as indexed color.
Traneptora
2022-05-16 02:18:24
is it true that all-gray RGB images are always R=X, G=X, B=X though?
2022-05-16 02:18:50
Usually I see grayscale treated as a YUV luma plane, with U and V both zeros
2022-05-16 02:19:06
but depending on the 3x3 matrix, that won't necessarily translate to Cb and Cr, will it?
_wb_
2022-05-16 03:23:47
shouldn't every variant of yuv make u=v=0 iff r=g=b?
Traneptora
2022-05-16 03:37:37
I don't know
2022-05-16 03:37:44
that was my question
_wb_
2022-05-16 04:00:03
I _think_ so, but no idea if there is some exotic yuv variant that doesn't do that. It would be weird though - then yuv400 would not be grayscale in that yuv variant...
Traneptora
2022-05-16 04:06:50
but for all the sane ones it is that way, I see
veluca
2022-05-16 04:23:54
what's gray anyway
BlueSwordM
veluca what's gray anyway
2022-05-16 04:30:52
Gray is the luma channel, contrast and brightness only.
veluca
2022-05-16 04:32:23
I mean what is gray *light*
2022-05-16 04:32:40
or... whatever
JendaLinda
2022-05-16 04:45:42
Gray is an intensity of light. It doesn't need to be visible light, it could be infrared as well. Or it doesn't have to be a picture at all. A grayscale image can represent any arbitrary rectangular array of values.
_wb_
2022-05-16 04:55:54
a grayscale image can be used for just any single-channel data (could be an alpha channel or whatever), but I'd say what makes a visual image 'grayscale' is that all the pixels have the same chromaticity as the WhitePoint
2022-05-16 04:57:26
one way to think about grayscale is that the color gamut is extremely restricted to a triangle in xy that is just a dot, i.e. the R, G and B primaries all coincide and are equal to the WhitePoint itself
2022-05-16 05:00:51
I guess you _could_ have a grayscale image with a WhitePoint that is actually not white at all, e.g. not even close to D65 but somewhere randomly in the xy plane. I suppose if you convert such a grayscale image to sRGB, you should get pixel values that are very much not R=G=B, but colored (but of course monochromatic)
2022-05-16 05:01:58
I wonder what happens if you try something like that in libjxl. Do we do whitepoint conversion?
veluca
2022-05-16 05:41:32
I believe so yes
_wb_
2022-05-16 06:20:35
Could be fun to make a nominally grayscale image that looks purple, could be a good test of cms being implemented properly
veluca
2022-05-16 06:42:28
I'm not sure, but I think with advanced LUT trickery you could also make a 1c image that looks 2 different colors or something
2022-05-16 06:42:41
(basically use the CMS to do indexed color, but continuous)
JendaLinda
2022-05-16 06:56:49
I think it would be a neat trick to add sepia tone to a plain black and white picture.
_wb_
2022-05-16 07:31:36
Yes with full icc you can probably do very funky stuff, I was wondering what can be done within the enum though
2022-05-16 07:32:10
Might be good to add some stuff to jxl_from_tree to set custom enum spaces
2022-05-16 07:32:35
Also for HDR jxl art and stuff :)
2022-05-16 08:12:18
hm, just changing the whitepoint on a grayscale image doesn't seem to make it render any differently
2022-05-16 08:12:29
2022-05-16 08:12:41
looks like just a normal grayscale gradient to me
2022-05-16 08:13:11
``` $ jxlinfo 0.jxl JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale Color space: Grayscale, Custom, white_point(x=0.100000,y=0.700000), sRGB transfer function, rendering intent: Perceptual ```
2022-05-16 08:14:58
<@604964375924834314> maybe I'm misunderstanding how WhitePoint is supposed to work or what it is supposed to mean. I never quite completely understood what WhitePoint is supposed to mean exactly...
spider-mario
2022-05-16 08:18:11
AFAIU, in grayscale, it would only matter when using the “absolute colorimetric” rendering intent
2022-05-16 08:18:44
perceptual and relative do chromatic adaptation from the source to the target colorspaces
2022-05-16 08:19:02
but in a grayscale image, everything has the chromaticity of the white point, so…
_wb_
2022-05-16 08:19:05
2022-05-16 08:19:29
I tried kAbsolute too, did the same thing
2022-05-16 08:20:37
Maybe when I view the decoded png my viewer just didn't bother
spider-mario
2022-05-16 08:21:11
that’s quite possible
2022-05-16 08:21:20
iirc, Skia doesn’t do absolute, for example
_wb_
2022-05-16 08:22:43
Question is if I set it to kAbsolute with that absurdly purple whitepoint, what should it look like and how will it look in the various jxl viewers?
spider-mario
2022-05-16 08:23:07
isn’t it green?
_wb_
2022-05-16 08:23:22
Or whatever it is, not white in any case
2022-05-16 08:23:49
Yeah very green
2022-05-16 08:24:29
In theory with kPerceptual you should be able to make sepia toned images and stuff like that?
2022-05-16 08:24:44
I mean kAbsolute
spider-mario
2022-05-16 08:24:50
yes, I think so
_wb_
2022-05-16 08:25:31
We should test if that works and if not, how to make it work, because that's a nifty trick
spider-mario
2022-05-16 08:25:48
although, if you were to convert to a colorspace with a warmer white point, the image may actually look more blue
2022-05-16 08:26:02
absolute colorimetric is considered kind of a niche use case
2022-05-16 08:26:10
most useful when simulating printing or this sort of thing
2022-05-16 08:26:19
(“soft proofing”)
_wb_
2022-05-16 08:27:28
So kPerceptual and kRelative map source wp to target wp, right?
2022-05-16 08:27:35
But kAbsolute doesn't
2022-05-16 08:27:47
What about kSaturation?
spider-mario
2022-05-16 08:30:05
iirc, it also does
2022-05-16 08:30:20
and to preserve saturation, it cares less about preserving hue
2022-05-16 08:31:20
it’s also a bit niche; from what I understand, the use-case that is usually mentioned is computer graphics that must keep their “pop” at the expense of accuracy
2022-05-16 08:31:58
https://www.cambridgeincolour.com/tutorials/color-space-conversion.htm is a nice article from what I recall
_wb_
2022-05-16 08:37:11
So if I understand correctly, kRelative and kAbsolute are the only ones that try to be 'correct', i.e. they clip out of gamut colors to the nearest in gamut color, and the difference is whether they adjust for WP too or not
2022-05-16 08:38:06
kPerceptual just scales/stretches one gamut to another, so e.g. when going from wide gamut to small gamut it will desaturate
2022-05-16 08:38:57
And kSaturation also stretches to fit in the target gamut, but prioritizes saturation preservation instead of hue preservation
spider-mario
2022-05-16 08:49:54
your and my understandings coincide
_wb_
2022-05-16 09:24:30
Yay!
veluca
2022-05-16 09:29:19
I like how you didn't say that it's correct xD
_wb_
2022-05-16 09:45:07
Who knows what is correct in all this mess where correctness is ultimately decided by what the popular tools happen to implement 😅
2022-05-17 03:04:00
so if I make this jxl file: $ jxlinfo 0.jxl JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale Color space: Grayscale, Custom, white_point(x=0.700000,y=0.100000), sRGB transfer function, rendering intent: Absolute
2022-05-17 03:04:14
and djxl it to a png
2022-05-17 03:05:16
then the png has some synthesized icc profile: icc©️ Copyright 2019 Google LLC, CC-BY-SA 3.0 Unported license(https://creativecommons.org/licenses/by-sa/3.0/legalcode) icc:description: Gra_0.7;0.1_Abs_SRG
2022-05-17 03:06:31
but no viewer shows it in any different way than with a normal white point
2022-05-17 03:07:01
if I cjxl the png to make an XYB VarDCT jxl file, it also looks like just regular grayscale
2022-05-17 03:11:53
same if I make a color image with a funky whitepoint and kAbsolute rendering intent — the whitepoint seems to be just ignored
2022-05-17 03:18:38
ah no, interesting
2022-05-17 03:19:26
when I make an RGB image (that only contains R=G=B) with a funky white point and kAbsolute rendering intent, something is happening in some viewers
2022-05-17 03:20:03
left is how photoshop shows the grayscale and rgb images, on the right is how Preview shows the rgb image
2022-05-17 03:21:24
firefox and chrome just show the rgb image as grayscale though
2022-05-17 03:22:15
no idea what photoshop is doing but it looks weird to me
Traneptora
2022-05-17 04:40:37
`mpv` might display it properly once my patch to libavcodec gets merged
2022-05-17 04:40:39
we'll have to see
2022-05-17 04:40:59
currently I'm awaiting a different patch
_wb_ so if I make this jxl file: $ jxlinfo 0.jxl JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale Color space: Grayscale, Custom, white_point(x=0.700000,y=0.100000), sRGB transfer function, rendering intent: Absolute
2022-05-17 04:43:14
can you upload this file? and is it XYB encoded?
2022-05-17 04:43:19
or is it modular grayscale
2022-05-17 04:43:45
it would be a good test sample anyway
_wb_
2022-05-17 07:26:00
I tried just modular grayscale and modular rgb
2022-05-17 07:26:44
Looks like nothing I could find does white point adjustment for grayscale png images
2022-05-17 07:28:03
And only Preview does white point adjustment for rgb png images (maybe some others too, didn't try a lot)
2022-05-17 07:29:18
Photoshop does something but it looks buggy
2022-05-17 07:32:31
My guess is that most things just ignore whitepoint and only convert the primaries, or perhaps they don't ignore it but just always map it to the target whitepoint (which is ok for all rendering intents that are in common use)
190n
2022-05-18 01:42:00
<@&807636211489177661> ban
2022-05-18 01:42:17
looks like this person's from matrix bridge
_wb_
2022-05-18 05:03:25
Yeah, cannot ban without killing the matrix bridge
190n
2022-05-18 05:09:47
you can delete the message tho right?
2022-05-18 05:10:00
idk who has authority to ban that user on the matrix side (if they haven't already been banned)
_wb_
2022-05-18 05:11:59
Yeah I can delete the message
190n
2022-05-18 05:12:16
ty
_wb_
Traneptora can you upload this file? and is it XYB encoded?
2022-05-18 07:00:32
here's the grayscale image with funky whitepoint
2022-05-18 07:01:03
and here's the rgb one (that is also grayscale in the sense that r=g=b) with the same funky whitepoint
2022-05-18 07:25:47
you can't make this stuff up
2022-05-18 07:26:55
this is how Finder and Preview show these two images (when converted to png)
2022-05-18 07:27:57
Finder is interesting: at first it shows the image as if it doesn't have a funky whitepoint, but when I resize the Finder window that somehow triggers it to do something else
Traneptora
2022-05-19 06:51:28
so this is the crash I'm getting on windows
2022-05-19 06:51:31
```gdb (gdb) where #0 0x00007ff7c21a9d3c in operator() (__closure=0x9c2c7fc8e0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881 #1 0x00007ff7c23a7a58 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x29183ce9bc0, command=command@entry=2160, thread=thread@entry=3) at ../lib/threads/thread_parallel_runner_internal.cc:139 #2 0x00007ff7c23a7b3a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x29183ce9bc0, thread=3) at ../lib/threads/thread_parallel_runner_internal.cc:169 #3 0x00007ff7c304b151 in execute_native_thread_routine () #4 0x00007ff7c11b44d3 in pthread_create_wrapper (args=0x29183d1e1c0) at src/thread.c:1533 #5 0x00007ffa6a0cdfb4 in msvcrt!_beginthreadex () from /c/WINDOWS/System32/msvcrt.dll #6 0x00007ffa6a0ce08c in msvcrt!_endthreadex () from /c/WINDOWS/System32/msvcrt.dll #7 0x00007ffa6ad954e0 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL #8 0x00007ffa6bd8485b in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll #9 0x0000000000000000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) frame 0 #0 0x00007ff7c21a9d3c in operator() (__closure=0x9c2c7fc8e0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881 881 in /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h (gdb) frame 1 #1 0x00007ff7c23a7a58 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x29183ce9bc0, command=command@entry=2160, thread=thread@entry=3) at ../lib/threads/thread_parallel_runner_internal.cc:139 139 self->data_func_(self->jpegxl_opaque_, task, thread); (gdb) frame 2 #2 0x00007ff7c23a7b3a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x29183ce9bc0, thread=3) at ../lib/threads/thread_parallel_runner_internal.cc:169 169 RunRange(self, command, thread); (gdb) frame 3 #3 0x00007ff7c304b151 in execute_native_thread_routine () (gdb) ```
2022-05-19 06:52:41
what's weird is my cpu supports AVX so I don't know what the issue is
_wb_
2022-05-19 06:55:23
So one hypothesis is it's a compiler bug that somehow only gets triggered in windows builds
Traneptora
2022-05-19 06:55:44
mingw-w64 btw
_wb_
2022-05-19 06:56:48
Another hypothesis is that we're using an avx2 instruction that we think does not require alignment, but it perhaps actually does require alignment, and for some reason memory is aligned when building/running in linux but not in windows
2022-05-19 06:57:52
Another hypothesis is that it's a bug on our end that just happens to never trigger on any CI build but does trigger in that windows build
2022-05-19 06:58:38
Another hypothesis is that there's something wrong with the build flags or something when building for windows
Traneptora
_wb_ Another hypothesis is that we're using an avx2 instruction that we think does not require alignment, but it perhaps actually does require alignment, and for some reason memory is aligned when building/running in linux but not in windows
2022-05-19 06:58:57
this one can easily be tested though, can it? by somehow requiring aligned memory
2022-05-19 06:59:25
and seeing if that fixes the problem
2022-05-19 06:59:40
I'm not sure how to force alligned allocs though, tbh
_wb_
2022-05-19 07:00:47
We have a macro for that, don't recall what it is but probably JXL_ALIGNED or HWY_ALIGNED or something
2022-05-19 07:01:05
The image buffer allocs we do are already aligned
2022-05-19 07:01:33
But things like local arrays or something may not be
Traneptora
2022-05-19 07:01:52
this is a client using libjxl, if I can create a `malloc` that forces alignment, I can pass it to `libjxl_memory_manager` and then test it that way too, right?
_wb_
2022-05-19 07:02:07
Do you also get a crash at -e 5 but not at -e 5 --gaborish=0?
Traneptora
2022-05-19 07:02:26
this is a libjxl client so I can't set gaborish to 0
2022-05-19 07:02:32
not cjxl
_wb_
2022-05-19 07:02:43
There's a libjxl encode option for it too I think
2022-05-19 07:02:50
Should be in any case
Traneptora
2022-05-19 07:05:08
okay, so I checked, and apparently I set the libjxl memory manager to use `av_malloc` as its malloc function, which apparently aligns *all* blocks of memory suitable for vectorization, according to the documentation
_wb_
2022-05-19 07:05:40
The explicit allocs are not the issue I think, it's more things like the code saying `const float foo[32] = bla` and in one build this will put it somewhere aligned and in another it may be unaligned.
2022-05-19 07:06:23
But doesn't hurt to check if changing the malloc to see if that helps
Traneptora
_wb_ We have a macro for that, don't recall what it is but probably JXL_ALIGNED or HWY_ALIGNED or something
2022-05-19 07:07:35
does this force proper stack alignment as well?
2022-05-19 07:07:41
in addition to heap memory allocations?
_wb_
2022-05-19 07:07:51
Iirc we already do stuff to force alignment for image buffers where we have plenty of stuff that requires alignment. I think we just allocate more than needed and round the pointer up so it is aligned.
Traneptora
2022-05-19 07:08:33
since the issue appears to be stack allocated memory
_wb_
2022-05-19 07:20:46
https://github.com/libjxl/libjxl/blob/main/lib/jxl/convolve.cc#L363 is that the spot that could be the issue, <@179701849576833024> ?
2022-05-19 07:22:10
What would be the way to write that without using LoadDup but with LoadU?
veluca
2022-05-19 07:22:20
It's not trivial
2022-05-19 07:23:04
Easier to require alignment on the weights fields and struct/variable:)
2022-05-19 07:23:27
Ah actually not
2022-05-19 07:23:40
Ugh this is a bit of a pain
2022-05-19 07:23:54
We *can* change the impl of loaddup though
_wb_
2022-05-19 07:25:02
I can't test but I think we need to urgently fix this, too many crashes reported in windows, I just saw Krita is also getting reports about it
veluca
2022-05-19 07:25:27
I'll give it a look later
2022-05-19 07:29:45
<@853026420792360980> can you see which of the 3 impl is selected here? https://github.com/google/highway/blob/master/hwy/ops/x86_256-inl.h#L2226
2022-05-19 07:30:51
> Backtrace stopped: previous frame inner to this frame (corrupt stack?) I have to say that's not very reassuring
Traneptora
veluca <@853026420792360980> can you see which of the 3 impl is selected here? https://github.com/google/highway/blob/master/hwy/ops/x86_256-inl.h#L2226
2022-05-19 07:32:12
how would I do that?
veluca
2022-05-19 07:32:20
mhhh
Traneptora
2022-05-19 07:32:34
should I just `#error` all 3 with different error messages?
veluca
2022-05-19 07:32:37
one easy way is to add a `#error "impl 1"`
2022-05-19 07:32:38
yeeeep
2022-05-19 07:32:44
I mean that's what I would do
_wb_
2022-05-19 07:36:54
What's the purpose/reason for having 3 implementations in the first place?
veluca
2022-05-19 07:39:47
don't ask me 🙂
_wb_
Traneptora mingw-w64 btw
2022-05-19 07:44:22
Is that clang or gcc? Usually gcc, right?
Traneptora
veluca one easy way is to add a `#error "impl 1"`
2022-05-19 07:45:00
third one, the #else
veluca
2022-05-19 07:45:00
at least *some* build Sami got had actually msvc apparently
2022-05-19 07:45:07
or at least something looking very much like it
Traneptora
2022-05-19 07:45:22
```c #if HWY_LOADDUP_ASM #error "impl 1" __m256 out; asm("vbroadcastf128 %1, %[reg]" : [ reg ] "=x"(out) : "m"(p[0])); return Vec256<float>{out}; #elif HWY_COMPILER_MSVC && !HWY_COMPILER_CLANG #error "impl 2" const __m128 v128 = LoadU(Full128<float>(), p).raw; return Vec256<float>{ _mm256_insertf128_ps(_mm256_castps128_ps256(v128), v128, 1)}; #else #error "impl 3" return Vec256<float>{_mm256_broadcast_ps(reinterpret_cast<const __m128*>(p))}; #endif ```
2022-05-19 07:45:25
impl 3 was returned
veluca
2022-05-19 07:45:32
ok, then can you do 2 things... (a) print the pointer 😄 (b) switch to impl 2
Traneptora
2022-05-19 07:45:50
print *which* pointer
veluca
2022-05-19 07:45:55
p
Traneptora
2022-05-19 07:46:00
how?
veluca
2022-05-19 07:46:13
`fprintf(stderr, "%p\n", p);` should do it
_wb_
2022-05-19 07:46:36
%p, I didn't know that was a thing
veluca
2022-05-19 07:46:50
will do some spam possibly but only the last one before the crash matters
_wb_
2022-05-19 07:47:11
I always do %X or something
2022-05-19 07:48:08
Maybe first worth checking if switching to impl 2 actually fixes it
2022-05-19 07:48:29
How do we know LoadDup is even the culprit?
2022-05-19 07:49:56
Is it because the remaining stuff in the enc gaborish convolution is unlikely to be broken because if it were, a lot more stuff would be broken too?
veluca
2022-05-19 07:50:19
well, there were some stack traces failing on vbroadcastf128
2022-05-19 07:50:31
which is pretty much what LoadDup *is*
_wb_
2022-05-19 07:50:39
Ah I see
2022-05-19 07:50:48
Well, impl 1 and 3 are that
2022-05-19 07:51:42
LoadDup is not in the hot loop, right?
2022-05-19 07:51:52
Do we use LoadDup in any hot loop?
veluca
2022-05-19 07:52:11
it loads some xyb constants and other similar things
_wb_
2022-05-19 07:52:38
If not, can we just not bother with speed of that initialization? There is no way it will make any difference
2022-05-19 07:56:45
So if the crashing pointer happens to be unaligned, what we are dealing with then is an avx2 documentation bug, right? Since officially the instruction does not require alignment but in reality it seems like it does.
veluca
2022-05-19 07:58:55
I think so yes
2022-05-19 07:59:29
but it might be *very* unaligned, like not aligned to float width, which is documented as a crashing condition (and probably a compiler bug)
_wb_
2022-05-19 08:03:46
https://c.tenor.com/GFSbcD9PgycAAAAM/drumroll-drum.gif
Traneptora
veluca ok, then can you do 2 things... (a) print the pointer 😄 (b) switch to impl 2
2022-05-19 08:03:49
how would I switch to impl 2 anyway? just remove the #else, replace the #elif foo with #else?
_wb_
2022-05-19 08:04:10
Yes, but first print the pointer pls
Traneptora
2022-05-19 08:04:21
currently rebuilding
_wb_
2022-05-19 08:04:35
I'm curious what its last digits are...
Traneptora
2022-05-19 08:04:40
I also had to #include cstdio but otherwise yes it worked
2022-05-19 08:04:57
i.e. it compiled
_wb_
2022-05-19 08:05:30
Could of course be that adding the print messes things up and makes it work, lol
Traneptora
2022-05-19 08:05:37
no, I mean it compiled
2022-05-19 08:05:39
I haven't run it yet
veluca
2022-05-19 08:05:44
wouldn't be the first time, and won't be the last
Traneptora
2022-05-19 08:06:03
this is all cross-compiling inside a docker image so it's all very slow
_wb_
2022-05-19 08:06:55
Yeah and changing that file probably causes a near full recompile
Traneptora
2022-05-19 08:07:28
and then FFmpeg has to recompile too
2022-05-19 08:07:45
and then to actually test it I gotta reboot to my windows install
2022-05-19 08:07:48
yuck
2022-05-19 08:08:02
I can test it in wine but I'd prefer a native install for true diagnostic info
veluca
2022-05-19 08:12:47
not the fastest build-test cycle ever lol
Traneptora
2022-05-19 08:13:53
testing it in wine, it's spitting out *lots* of pointers
2022-05-19 08:14:12
god this is slow
2022-05-19 08:14:18
time to reboot and test natively
_wb_
2022-05-19 08:17:30
Can you do | tail or something?
Traneptora
2022-05-19 08:17:47
``` 00007ff670510980 00007ff670510920 00007ff670510970 Thread 31 received signal SIGSEGV, Segmentation fault. [Switching to Thread 13140.0x3384] 0x00007ff66ebaa7dc in operator() (__closure=0xca469fcd90, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881 881 /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h: No such file or directory. (gdb) where #0 0x00007ff66ebaa7dc in operator() (__closure=0xca469fcd90, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881 #1 0x00007ff66edab8b8 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x2484eacc680, command=command@entry=2160, thread=thread@entry=1) at ../lib/threads/thread_parallel_runner_internal.cc:139 #2 0x00007ff66edab99a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x2484eacc680, thread=1) at ../lib/threads/thread_parallel_runner_internal.cc:169 #3 0x00007ff66fa53b51 in execute_native_thread_routine () #4 0x00007ff66dbb44d3 in pthread_create_wrapper (args=0x2484eaccd40) at src/thread.c:1533 #5 0x00007ffe7b23dfb4 in msvcrt!_beginthreadex () from /c/WINDOWS/System32/msvcrt.dll #6 0x00007ffe7b23e08c in msvcrt!_endthreadex () from /c/WINDOWS/System32/msvcrt.dll #7 0x00007ffe7ba954e0 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL #8 0x00007ffe7d06485b in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll #9 0x0000000000000000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) ```
2022-05-19 08:17:53
ran it inside gdb
2022-05-19 08:18:13
lemme try with threads=1
_wb_
2022-05-19 08:18:52
So
2022-05-19 08:18:56
70
Traneptora
2022-05-19 08:18:58
It's not printing anything with threads=1
_wb_
2022-05-19 08:19:20
Means it is aligned to 16 bytes but not to 32 bytes, right?
Traneptora
2022-05-19 08:19:21
well, threads=0
2022-05-19 08:19:45
now it prints nothing
2022-05-19 08:20:28
disabling threading makes it easier to hit
_wb_
2022-05-19 08:20:49
The previous pointers are aligned to 32 but that one only to 16
Traneptora
2022-05-19 08:20:55
``` (gdb) where #0 0x00007ff66ebaa7dc in operator() (__closure=0x7b0f7fcda0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881 #1 0x00007ff66edab6de in jpegxl::ThreadParallelRunner::Runner (runner_opaque=0x203175792c0, jpegxl_opaque=0x7b0f7fcd90, init=<optimized out>, func=0x7ff66ebaa910 <jxl::ThreadPool::RunCallState<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> >::CallDataFunc(void *, uint32_t, size_t)>, start_range=0, end_range=2160) at ../lib/threads/thread_parallel_runner_internal.cc:72 #2 0x00007ff66eba87f3 in jxl::ThreadPool::Run<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> > (begin=0, init_func= @0x7ff66f7cfe30: {jxl::Status (unsigned long long)} 0x7ff66f7cfe30 <jxl::ThreadPool::NoInit(unsigned long long)>, caller=<synthetic pointer>, data_func=..., end=<optimized out>, this=0x203175bbd00) at ../lib/jxl/base/data_parallel.h:50 #3 jxl::RunOnPool<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> > (begin=0, init_func=@0x7ff66f7cfe30: {jxl::Status (unsigned long long)} 0x7ff66f7cfe30 <jxl::ThreadPool::NoInit(unsigned long long)>, caller=<synthetic pointer>, data_func=..., end=<optimized out>, pool=0x203175bbd00) at ../lib/jxl/base/data_parallel.h:108 #4 jxl::N_AVX2::SRGBToXYB (srgb=..., premul_absorb=<optimized out>, premul_absorb@entry=0x7b0f7fce60, pool=pool@entry=0x203175bbd00, xyb=<optimized out>, xyb@entry=0x7b0f7fd700) at ../lib/jxl/enc_xyb.cc:195 ```
veluca
2022-05-19 08:21:26
apparently that's indeed the case
2022-05-19 08:21:39
there's also a way to get the local asm
2022-05-19 08:21:41
one sec
Traneptora
2022-05-19 08:22:06
if I turn off threading it prints nothing
2022-05-19 08:22:20
ever
veluca
2022-05-19 08:22:23
disassemble
2022-05-19 08:22:31
ok that is *weird*
veluca disassemble
2022-05-19 08:22:56
I mean that's a gdb command
Traneptora
2022-05-19 08:22:57
actually wait
2022-05-19 08:23:04
oh, the gdb command prints nothing
2022-05-19 08:23:11
it's printing stuff fine if I disable threading
veluca
2022-05-19 08:23:23
could be gdb + threads is broken
Traneptora
2022-05-19 08:23:23
god, it's a heisenbug
veluca
2022-05-19 08:23:38
well if it's really due to alignment then yep indeed
Traneptora
2022-05-19 08:24:04
it appears it doesn't need 32-bit alignment though
2022-05-19 08:24:09
``` 00007ff6705108f0 00007ff670510940 00007ff6705108e0 00007ff670510980 00007ff670510920 00007ff670510970 00007ff670510910 00007ff670510960 00007ff670510900 00007ff670510950```
2022-05-19 08:24:11
this got printed
veluca
2022-05-19 08:25:14
mhhhhh
_wb_
2022-05-19 08:25:33
Does it ever crash just after a 32 aligned pointer?
veluca
2022-05-19 08:26:02
maybe we hit a stack size limit? that'd be rather odd though
_wb_
2022-05-19 08:26:29
Could it be that alignment is needed but it sometimes works when not aligned?
Traneptora
2022-05-19 08:26:35
https://0x0.st/oaa9.log
2022-05-19 08:27:18
does this mean anything to you?
veluca
2022-05-19 08:28:09
... looks like it's not on failing on a loaddup
2022-05-19 08:28:37
but on a Load()
2022-05-19 08:28:58
well, a store in this case
_wb_
2022-05-19 08:29:33
So in the convolve loop?
veluca
2022-05-19 08:30:14
no it's in SRGBToXYB now
2022-05-19 08:30:27
what.
_wb_
2022-05-19 08:30:42
That is weird
Traneptora
2022-05-19 08:30:53
still AVX2 though
2022-05-19 08:30:55
are there any helpful options I can pass to highway to help debug this?
veluca
2022-05-19 08:30:57
that looks *suspiciously* like argument passing under some calling convention
2022-05-19 08:31:01
give me a sec
_wb_
2022-05-19 08:31:03
That is also used at e < 5
Traneptora
2022-05-19 08:31:34
currently it's using e7
2022-05-19 08:31:46
e3 produces an identical crash
2022-05-19 08:32:05
In the same SRGBToXYZ function
2022-05-19 08:32:12
which is not surprising, if that's where it is
veluca
2022-05-19 08:32:21
I think this might be *another* problem
Traneptora
2022-05-19 08:33:06
you mentioned that `HWY_ALIGN` can force alignment, but how I can enable that option? running libjxl's cmake with `-DHWY_ALIGN=ON` just yields a warning that it is unused
veluca
2022-05-19 08:33:34
no that would be a C++ macro
2022-05-19 08:33:53
can you print %rsp in gdb?
_wb_
2022-05-19 08:34:00
Yeah no that's something we would need to use in the code on whatever variable needs to be aligned
2022-05-19 08:35:01
If this issue is not in gaborish it is something completely different
Traneptora
veluca can you print %rsp in gdb?
2022-05-19 08:36:27
how?
veluca
2022-05-19 08:37:31
print %rsp
2022-05-19 08:37:34
I think
Traneptora
2022-05-19 08:37:57
``` (gdb) print %rsp A syntax error in expression, near `%rsp'. ```
veluca
2022-05-19 08:38:13
whatever
2022-05-19 08:38:15
`info registers`
2022-05-19 08:38:30
let's go as verbose as we can xD
Traneptora
2022-05-19 08:38:40
``` (gdb) info registers rax 0x8a529fc710 594091689744 rbx 0x24e8b310d80 2536365952384 rcx 0x8a529fc750 594091689808 rdx 0x8a529fc730 594091689776 rsi 0x8a529fc730 594091689776 rdi 0x8a529fc750 594091689808 rbp 0x24e972efd00 0x24e972efd00 rsp 0x8a529fc650 0x8a529fc650 r8 0x0 0 r9 0x7ff66ebaa910 140696396409104 r10 0x8a529fc8b0 594091690160 r11 0x24e8d301c80 2536399445120 r12 0x24e952efc80 2536533589120 r13 0x0 0 r14 0x8a529fc8c0 594091690176 r15 0x24e8d301c80 2536399445120 rip 0x7ff66ebaa7dc 0x7ff66ebaa7dc <operator()(uint32_t, size_t) const+188> eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x53 83 gs 0x2b 43 (gdb) ```
veluca
2022-05-19 08:38:57
well that's an interesting stack pointer
Traneptora
2022-05-19 08:39:14
I had to run it again btw, so the virtual address space might be a bit different
2022-05-19 08:39:22
should I repost the disassemble log?
veluca
2022-05-19 08:39:40
nah
2022-05-19 08:39:43
rsp is not aligned
2022-05-19 08:39:46
that's not great
_wb_
2022-05-19 08:39:55
Can the stack pointer be messed up causing gdb to think it is in a different function?
Traneptora
2022-05-19 08:40:02
looks 16-bit aligned, yea
2022-05-19 08:40:39
okay, interesting. sometimes, it crashes immediately, and sometimes it starts spitting out the printf %ps
2022-05-19 08:40:50
I'm wondering if it's random based on if the pointer happens to be aligned or not
veluca
2022-05-19 08:40:57
almost certain
_wb_
2022-05-19 08:41:00
Then I go back to the assumption that unaligned LoadDup is just causing random trouble
veluca
2022-05-19 08:41:13
I'm going to bet it depends on whether rsp is 16-bit aligned or 32-bit aligned
Traneptora
2022-05-19 08:41:30
is there a way to *force* it to be 32-bit aligned, perhaps via compile flag?
veluca
2022-05-19 08:41:37
trying to figure it out
Traneptora
2022-05-19 08:41:40
I could try using `-mstack-align` or `-mdata-align` if that is necessary
2022-05-19 08:41:44
I don't know if those work on x86 though
veluca
2022-05-19 08:42:10
what compiler are you using for this build?
Traneptora
2022-05-19 08:43:32
mingw-w64
2022-05-19 08:43:39
which would be gcc
_wb_
2022-05-19 08:43:41
I guess it would be useful to try if switching to impl 2 makes it work, to verify that that is indeed the issue
veluca
2022-05-19 08:43:49
no, this is unrelated
2022-05-19 08:44:18
this is a compiler-generated register spill / parameter passing
2022-05-19 08:44:57
one shouldn't do an unaligned store to 0xe0(%rsp) if rsp is 16-bit aligned 🙂