|
_wb_
|
2022-05-11 09:36:18
|
Gamut mapping is just clamping when requesting uint, and returning out of range values when requesting float, iiuc
|
|
2022-05-11 09:37:41
|
Tone mapping is done according to the tone mapping header iiuc, but <@604964375924834314> probably knows better
|
|
|
spider-mario
|
2022-05-11 09:46:39
|
there is code for tone mapping in the codebase but at present, it’s not really well-integrated with the rest
|
|
2022-05-11 09:46:45
|
djxl can use it but djxl_ng not yet
|
|
2022-05-11 09:46:52
|
libjxl doesn’t either
|
|
|
Traneptora
|
2022-05-12 02:39:44
|
can libjxl output any enum space including `JXL_PRIMARIES_CUSTOM` or is that an exception?
|
|
|
|
veluca
|
2022-05-12 02:54:00
|
at least in theory it should, yes
|
|
2022-05-12 02:54:08
|
(if XYB)
|
|
|
Traneptora
|
2022-05-12 03:02:45
|
I noticed the libjxl docs say this
|
|
2022-05-12 03:02:48
|
> The JPEG XL image has an attached ICC Profile, in that case, the encoded structured data is not available, this function will return an error status. JxlDecoderGetColorAsICCProfile should be called instead.
|
|
2022-05-12 03:02:55
|
Is this only the case for `TARGET_ORIGINAL`?
|
|
2022-05-12 03:04:14
|
I don't see why this would fail if you have an XYB encoded space and you call `TARGET_DATA`. I would think it would just return the color encoding you requested with `JxlDecoderSetPreferredColorEncoding`
|
|
|
veluca
at least in theory it should, yes
|
|
2022-05-12 03:06:01
|
Does this mean that if I can parse the ICC Profile for `TARGET_ORIGINAL`, and determine the primaries and white point from it, then in theory I could request the pixel data in the space described by that ICC Profile, using `JXL_PRIMARIES_CUSTOM` and then attach that original ICC Profile to that pixel data and have it render properly?
|
|
|
|
veluca
|
|
Traneptora
I don't see why this would fail if you have an XYB encoded space and you call `TARGET_DATA`. I would think it would just return the color encoding you requested with `JxlDecoderSetPreferredColorEncoding`
|
|
2022-05-12 03:07:16
|
I think this might be true, but best if you try ...
|
|
|
Traneptora
Does this mean that if I can parse the ICC Profile for `TARGET_ORIGINAL`, and determine the primaries and white point from it, then in theory I could request the pixel data in the space described by that ICC Profile, using `JXL_PRIMARIES_CUSTOM` and then attach that original ICC Profile to that pixel data and have it render properly?
|
|
2022-05-12 03:08:11
|
In principle yes, although of course not all color profiles are representable via enum :) (also you also would need to handle the TF too)
|
|
|
Traneptora
|
2022-05-12 03:09:07
|
wouldn't any white point and primaries be representable by the enums?
|
|
2022-05-12 03:09:14
|
what else is in the ICCP other than TRC?
|
|
2022-05-12 03:11:18
|
to be honest, what I'm trying to do here is request pixel data from the decoder in the `TARGET_ORIGINAL` space, and attach the `TARGET_ORIGINAL` iccp, which is not necessarily possible without a CMS
|
|
2022-05-12 03:11:36
|
but if I have a full version of libjxl and not just libjxl_dec then it will have a CMS included
|
|
2022-05-12 03:11:56
|
so I don't see why the decoder of the full version should not have this functionality
|
|
2022-05-12 03:12:03
|
is that a post-1.0 goal or a pre-1.0 goal?
|
|
2022-05-12 03:13:46
|
i.e. the full ersion of libjxl should export `JxlDecoderSetOriginalProfile` which sets the decoded pixel data to the data which will be accurately represented by `TARGET_ORIGINAL`'s ICC profile
|
|
2022-05-12 03:13:53
|
or something
|
|
|
|
veluca
|
|
Traneptora
what else is in the ICCP other than TRC?
|
|
2022-05-12 03:45:27
|
there could be a lot of things actually, i.e. LUT-based profiles... many (most?) ICCs are WP + PRI + TRC, but not all
|
|
2022-05-12 03:46:06
|
some ICCs don't even *have* primaries or transfer curves
|
|
2022-05-12 03:46:46
|
I think if we only had wp/pri/trc ICCs we'd just have implemented parsing in libjxl, but the world doesn't look that way so...
|
|
|
_wb_
|
2022-05-12 03:50:27
|
CMYK ICC profiles can be a megabyte big and god knows what they contain but it's certainly not just some primaries and transfer curve
|
|
|
|
veluca
|
2022-05-12 03:55:27
|
4D LUTs 😉
|
|
|
_wb_
|
2022-05-12 03:58:26
|
I guess. After all they try to model stuff like how paper absorbs ink and what chemically happens when inks are mixed, which is a lot more hairy stuff than modeling idealized light
|
|
|
Traneptora
|
|
Traneptora
to be honest, what I'm trying to do here is request pixel data from the decoder in the `TARGET_ORIGINAL` space, and attach the `TARGET_ORIGINAL` iccp, which is not necessarily possible without a CMS
|
|
2022-05-12 07:20:29
|
either way, is this something that would happen before or after 1.0?
|
|
|
_wb_
|
2022-05-12 07:25:25
|
I hope before
|
|
2022-05-12 07:26:14
|
https://github.com/libjxl/libjxl/pull/1366
|
|
2022-05-12 07:28:26
|
Implementing this new api should be mostly a matter of extending the final render pipeline stage to call cms on the floats just after interleaving and just before converting to the pixelformat. Shouldn't be terribly hard to do I suppose.
|
|
2022-05-12 07:30:14
|
And I assume it makes sense to have a default cms just like in the encoder (except in libjxl_dec where you wouldn't have that, so you'd get error when trying to set output space without setting cms first)
|
|
|
Traneptora
|
2022-05-12 07:39:50
|
oh, interesting, there's already a PR for this
|
|
2022-05-12 07:40:53
|
oh, it's just for the API
|
|
2022-05-12 07:41:18
|
If this implementation happens, would you also add SetPreferredOutputICCProfile?
|
|
2022-05-12 07:41:36
|
so you can pass it an ICC profile, and the pixel data received will be accurately described by that ICC Profile?
|
|
2022-05-12 07:42:06
|
either an ICC Profile for a monitor, or the TARGET_ORIGINAL profile if you're just passing it through?
|
|
|
_wb_
|
2022-05-12 07:53:35
|
So only preferred, not desired? Only transforming in case of xyb?
|
|
2022-05-12 07:54:45
|
You could do that by inspecting uses_original_profile and only setting desired profile if it's false, I guess.
|
|
|
Traneptora
|
2022-05-12 08:28:51
|
how would you set the desired profile, though?
|
|
2022-05-12 08:29:24
|
the API proposal changes don't have a function to set a preferred/desired ICC Profile, only desired output JxlColorEncoding
|
|
|
_wb_
|
2022-05-12 08:56:50
|
No it has both
|
|
2022-05-12 08:57:27
|
JxlDecoderSetICCProfile
|
|
|
Traneptora
|
2022-05-12 09:01:48
|
I didn't see it <:DunktsukiStare:759697530786938901>
|
|
2022-05-12 09:02:47
|
oh, it's after all the comments
|
|
2022-05-12 09:02:49
|
*nevermind*
|
|
2022-05-12 09:03:20
|
I assume it would fail if `uses_original_profile == true`?
|
|
2022-05-12 09:03:41
|
or will that always work?
|
|
|
_wb_
|
2022-05-12 09:05:22
|
Will always work, if cms is set (or default cms is available, I guess)
|
|
|
yurume
|
2022-05-12 09:06:26
|
DecodeHybridVarLenUint seems to be one of the things in jxl that I have to be really careful about overflow since it's essentially unbounded
|
|
|
_wb_
|
2022-05-12 09:08:44
|
Should fit in uint32 in practice I think. At least that's how we implement it iirc
|
|
|
yurume
|
2022-05-12 09:09:25
|
I recall that libjxl did have some sort of bounds in the hybrid int decoder, but I'm not sure if those bounds are for trees or for individual values
|
|
2022-05-12 09:11:04
|
(without looking at the actual code) my guess is the latter, as I've seen some bitstream where the LZ77 token starts at a very large number while only a few small values are actually used for literals
|
|
|
|
veluca
|
2022-05-12 09:11:17
|
libjxl `and`s every token's nbits with 31
|
|
2022-05-12 09:11:49
|
that's the only real bounds check there
|
|
2022-05-12 09:12:24
|
(which is equivalent to saying that everything that doesn't fit in u32 makes things go... weird)
|
|
|
yurume
|
2022-05-12 09:12:28
|
a valid bitstream but not properly supported by libjxl, so to say?
|
|
|
|
veluca
|
2022-05-12 09:13:02
|
I'd argue that if it's valid it's because we forgot to put the limit in the profiles and levels section xD
|
|
|
yurume
|
2022-05-12 09:15:26
|
jxl bitstream is very dense so I'm not sure that &31 is a good solution
|
|
2022-05-12 09:16:01
|
since it still leaves a possibility for an invalid bitstream to be decoded as valid
|
|
|
_wb_
|
2022-05-12 09:16:24
|
We do say that int32 has to be enough for the buffers, maybe we should add that uint32 has to be enough for the PackSigned residuals
|
|
|
|
veluca
|
2022-05-12 09:16:38
|
a very unlikely possibility, but yes
|
|
2022-05-12 09:17:00
|
given the spec doesn't say what happens to invalid bitstreams I wouldn't mind too much xD
|
|
|
yurume
|
2022-05-12 09:17:08
|
haha
|
|
2022-05-12 09:17:43
|
but yeah, my journey to the jxl spec (and jxsml) gave a feeling that the format is very unforgiving
|
|
|
|
veluca
|
2022-05-12 09:18:09
|
(also note that at least with ANS you have a builtin checksum, so it's not very *likely* that you'll end up decoding successfully)
|
|
|
yurume
|
2022-05-12 09:18:18
|
ah that is a valid point
|
|
2022-05-12 09:19:09
|
I'm yet to reach the point in the spec where ANS is frequently used (I did implement but couldn't test it for that reason)
|
|
|
|
veluca
|
2022-05-12 09:20:23
|
ICC is the first place to test it
|
|
2022-05-12 09:20:54
|
I give a 50% chance at best that all the details of histogram reading etc are actually correct in the spec xD
|
|
|
yurume
|
2022-05-12 09:21:42
|
I've already gone through ICC (didn't do the actual decode, but enough code to fully parse them and move on, which means a full entropy code impl)
|
|
|
|
veluca
|
2022-05-12 09:21:56
|
oh, very cool 😄
|
|
|
yurume
|
2022-05-12 09:21:58
|
and I think my test image didn't have ANS codes for ICC?
|
|
2022-05-12 09:22:11
|
maybe it is only useful for larger ICC profiles
|
|
|
|
veluca
|
2022-05-12 09:22:24
|
nah, trees are not for *all* ANS streams
|
|
2022-05-12 09:22:27
|
just for modular images
|
|
2022-05-12 09:22:40
|
i.e. frames with kModular, and DC in VarDCT
|
|
|
yurume
|
2022-05-12 09:22:53
|
ah, my bad, I meant codes not trees (which are not used in ANS itself)
|
|
|
|
veluca
|
2022-05-12 09:22:56
|
ah, could have been a huffman
|
|
|
yurume
|
|
|
veluca
|
2022-05-12 09:23:14
|
in fact it probably is most of the time
|
|
|
yurume
|
2022-05-12 09:23:32
|
I'm also missing a nested entropy code impl (which should be not too hard at this point, but I lack a test image)
|
|
|
|
veluca
|
2022-05-12 09:23:34
|
not too hard to convince cjxl to use ANS anyway
|
|
2022-05-12 09:23:39
|
nested?
|
|
|
yurume
|
2022-05-12 09:24:18
|
using yet another entropy code for decoding cluster maps
|
|
2022-05-12 09:24:39
|
or context maps (I think that was a terminology used by libjxl)
|
|
|
|
veluca
|
2022-05-12 09:25:54
|
ah yeah
|
|
2022-05-12 09:26:06
|
not sure if you can actually trigger that for ICC
|
|
2022-05-12 09:27:50
|
mhhh if you find an image with a huge ICC (or attach such an ICC to it with imagemagick), you should be able to
|
|
2022-05-12 09:28:22
|
another interesting point: https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_icc_codec.cc#L400
|
|
2022-05-12 09:28:39
|
if you remove that line, and make the ICC big enough, it *probably* will use ANS
|
|
|
yurume
|
2022-05-12 09:28:51
|
is there any particular reasoning for that line?
|
|
|
|
veluca
|
2022-05-12 09:29:14
|
yeah, Huffman can handle larger alphabets
|
|
2022-05-12 09:29:21
|
and for ICC that's often useful
|
|
|
yurume
|
2022-05-12 09:29:37
|
ah that kind of things
|
|
2022-05-12 09:29:59
|
and I'm also noticing `LZ77Method::kOptimal` for small enough ICC profiles, heh
|
|
|
|
veluca
|
2022-05-12 09:30:06
|
yeah
|
|
2022-05-12 09:30:12
|
that LZ77 encoder is *slow*
|
|
|
yurume
|
2022-05-12 09:30:31
|
rightly so for optimal parse
|
|
|
|
veluca
|
2022-05-12 09:30:44
|
I mean, way slower than it needs to be 😛
|
|
|
yurume
|
2022-05-12 09:31:03
|
well at least you only have exactly one chunk of compressed data unlike, say, DEFLATE :p
|
|
|
|
veluca
|
2022-05-12 09:31:54
|
ah, the format is fast enough
|
|
2022-05-12 09:32:03
|
just not enough effort in the enc impl
|
|
|
yurume
|
2022-05-12 09:32:37
|
for the format, did you mean DEFLATE? (was a bit confused for a moment)
|
|
|
|
veluca
|
2022-05-12 09:33:49
|
I mean JXL's LZ77 bitstream
|
|
|
yurume
|
|
Traneptora
|
2022-05-12 10:28:19
|
why dont' we just use zlib?
|
|
2022-05-12 10:28:22
|
for that
|
|
|
|
veluca
|
2022-05-12 10:56:52
|
in jxl?
|
|
2022-05-12 10:57:10
|
well, because (a) no context modeling (b) short lengths (c) huffman
|
|
|
Traneptora
|
2022-05-12 11:00:28
|
no I mean why not use zlib for lz77
|
|
2022-05-12 11:00:40
|
since iirc it can do that
|
|
|
|
veluca
|
2022-05-12 11:06:01
|
it can do it *for deflate*
|
|
2022-05-12 11:06:51
|
and even then, it doesn't even come close to `LZ77Method::kOptimal` in terms of compression (which is closer to zopfli, both in speed and in implementation 😛)
|
|
|
Traneptora
|
2022-05-12 11:07:30
|
ah
|
|
2022-05-13 04:17:27
|
could someone add the Documentation label to https://github.com/libjxl/libjxl/pull/1421
|
|
|
|
veluca
|
2022-05-13 07:14:49
|
done!
|
|
|
|
JendaLinda
|
2022-05-13 07:53:26
|
If RGBA picture is compressed using VarDCT, is the alpha channel compressed losslessly like in webp, or is VarDCT used for the alpha channel as well? Or are both options possible?
|
|
|
_wb_
|
2022-05-13 07:56:32
|
VarDCT can only operate on 3 channels
|
|
2022-05-13 07:56:43
|
So alpha is always encoded with modular
|
|
2022-05-13 07:56:53
|
It can be lossy or lossless
|
|
2022-05-13 07:57:43
|
Slightly lossy alpha is done by default when doing a lossy encode
|
|
2022-05-13 07:59:54
|
The artifacts of lossy modular are different from those of lossy VarDCT: ringing shouldn't happen (which could be weird in an alpha channel), though you can get some blur and jaggies.
|
|
|
|
JendaLinda
|
2022-05-13 08:01:03
|
Is there an option in cjxl to force lossless alpha?
|
|
2022-05-13 08:04:12
|
Also, what would be the benefit of using lossy Modular in general instead of VarDCT? Is it good for non photographic content?
|
|
|
_wb_
|
2022-05-13 08:04:42
|
Perhaps sometimes it is.
|
|
2022-05-13 08:05:50
|
I don't think there's currently an option in cjxl to set alpha quality independently from color quality
|
|
2022-05-13 08:06:23
|
Would it be useful?
|
|
2022-05-13 08:08:24
|
We don't have it in the encode api either atm. Wouldn't be very hard to do though, could even have distance settings for each extra channel separately.
|
|
|
|
JendaLinda
|
2022-05-13 08:10:38
|
I think lossy alpha might be undesirable in some cases. It would be useful to add the options.
Also it seems that VarDCT and Modular use a different quality scale.
|
|
2022-05-13 08:20:36
|
At least -d option didn't have any effect on Modular. I know I can use -q for both modes, but there I don't know what -q value is equivalent to "near lossless" -d 1 and what is the "near lossless" value for Modular (perhaps used for the lossy alpha too?).
|
|
|
_wb_
|
2022-05-13 08:22:49
|
In the current git version, -d and -q are just the same thing, and it influences both modular and vardct
|
|
2022-05-13 08:23:02
|
-d 1 is -q 90
|
|
|
|
JendaLinda
|
2022-05-13 08:31:36
|
Alright, 90 seems to be a good mapping value. Now I can do some comparisons.
|
|
|
|
veluca
|
|
_wb_
I don't think there's currently an option in cjxl to set alpha quality independently from color quality
|
|
2022-05-13 08:34:32
|
doesn't -Q (sort of by mistake) do that? or did we remove it?
|
|
|
_wb_
|
2022-05-13 08:35:22
|
We removed -Q
|
|
|
|
veluca
|
2022-05-13 08:36:16
|
ah, then we need an option for alpha quality indeed
|
|
|
_wb_
|
2022-05-13 08:44:53
|
Yes, it would also be useful for other extra channels, e.g. kBlack you also probably want to control separately, etc.
|
|
|
lonjil
|
2022-05-14 11:13:49
|
How good is modular mode at compressing photographic content lossily?
|
|
|
|
veluca
|
2022-05-14 12:20:38
|
not as good as vardct, but I can't tell you much more than that
|
|
|
Orum
|
2022-05-14 12:33:25
|
modular is good for some things, but I would not put photographs in that category
|
|
|
_wb_
|
2022-05-14 12:53:25
|
Better than flif but not as good as vardct :)
|
|
|
Traneptora
|
2022-05-14 02:22:23
|
well I suppose any lossy codec is better than flif at being lossy <:kek:857018203640561677>
|
|
2022-05-14 02:23:10
|
<@179701849576833024> just to be clear, did you want bulleted lists to have *more* indentation?
|
|
2022-05-14 02:25:39
|
and how much did you want?
|
|
2022-05-14 02:28:18
|
I ask because I modeled it after JxlDecoderGetColorAsEncodedProfile
|
|
2022-05-14 02:28:24
|
the whitespace, that is
|
|
2022-05-14 02:34:25
|
is this ideal?
|
|
2022-05-14 02:34:43
|
(yes it's a screenshot of code but I don't know how discord will wrap lines)
|
|
2022-05-14 02:35:02
|
|
|
|
|
veluca
|
2022-05-14 03:40:12
|
this seems ideal
|
|
2022-05-14 03:40:20
|
thanks 🙂
|
|
|
_wb_
|
2022-05-14 04:36:27
|
Flif is not _that_ horrible as a lossy codec: https://flif.info/lossy.html
|
|
2022-05-14 04:38:45
|
I mean, ignoring speed
|
|
2022-05-14 04:39:25
|
Probably not better than jpeg, but not horribly worse either
|
|
|
Traneptora
|
2022-05-14 04:54:59
|
Isn't flif lossless
|
|
2022-05-14 04:55:11
|
or is this proprocessed, like lossy png
|
|
|
_wb_
|
2022-05-14 05:00:37
|
Yes
|
|
2022-05-14 05:00:53
|
More effective than lossy png
|
|
2022-05-14 05:01:03
|
But same idea
|
|
|
Traneptora
|
2022-05-14 06:06:41
|
<@179701849576833024> sent an updated PR
|
|
2022-05-14 06:06:48
|
meaning I force-pushed
|
|
|
|
veluca
|
2022-05-14 06:12:27
|
and I approved 🙂
|
|
|
Traneptora
|
2022-05-14 06:49:42
|
I marked the two outdated conversations as resolved
|
|
|
|
veluca
|
2022-05-14 06:58:28
|
sure
|
|
2022-05-14 06:58:58
|
merged
|
|
2022-05-14 06:59:03
|
I thought I had automerge enabled...
|
|
|
Traneptora
|
|
veluca
I thought I had automerge enabled...
|
|
2022-05-15 12:42:46
|
it disabled when I force-pushed
|
|
|
yurume
|
2022-05-15 10:06:09
|
yay yet another spec bug
|
|
|
|
veluca
|
2022-05-15 10:06:48
|
lovely
|
|
2022-05-15 10:06:54
|
well, great that you found it 🙂
|
|
|
yurume
|
2022-05-15 10:07:23
|
well I've filed 20 spec issues so far, so finding another one is not a big deal 😉
|
|
2022-05-15 10:08:06
|
but some spec bugs are particularly annoying to track down, this time alias table lookup is completely broken
|
|
2022-05-15 10:08:42
|
(the spec says `offset = offsets[i] + pos;`, which should be a conditional move just like symbols)
|
|
2022-05-15 10:09:32
|
I still have at least one outstanding bug I haven't yet tracked down, hopefully that's the last one before I can actually decode a whole ANS stream
|
|
|
|
veluca
|
|
yurume
I still have at least one outstanding bug I haven't yet tracked down, hopefully that's the last one before I can actually decode a whole ANS stream
|
|
2022-05-15 10:10:53
|
the spec says that?
|
|
2022-05-15 10:10:57
|
I mean, I can see it does
|
|
|
yurume
|
|
|
veluca
|
2022-05-15 10:11:12
|
but how did I manage to write it that way? xD
|
|
|
yurume
|
2022-05-15 10:11:29
|
sleep and/or caffeine deprivation? lol
|
|
2022-05-15 10:11:50
|
spec doesn't have any automated test, so...
|
|
|
|
veluca
|
2022-05-15 10:12:00
|
more a question of too many changes over time I think 😛
|
|
2022-05-15 10:12:27
|
ah, the C++ code does say `s.offset = offsets1_or_0 + pos`
|
|
2022-05-15 10:12:33
|
so I can see how that would happen
|
|
2022-05-15 10:12:38
|
just... yeah
|
|
|
yurume
|
2022-05-15 10:13:53
|
well there are also just typos, like the alias table _initialization_ code where i and u and o are randomly swapped
|
|
|
|
veluca
|
2022-05-15 10:14:10
|
🤦♂️
|
|
2022-05-15 10:14:29
|
and here I thought I copied that from the C++ code
|
|
2022-05-15 10:15:03
|
AFAIU it should be `offset = (pos >= cutoffs[i]) ? offsets[i] + pos : pos` or something like that
|
|
|
yurume
|
2022-05-15 10:15:14
|
I think the original standard had them in prose (what I'm working with now is not one but the current community draft, as had been shared recently)
|
|
|
|
veluca
|
|
yurume
|
2022-05-15 10:15:27
|
so the conversion may have corrupted that
|
|
|
|
veluca
|
2022-05-15 10:15:36
|
I remember spending a lot of time writing it in prose, and ISO basically saying "nah, code"
|
|
2022-05-15 10:15:45
|
then I copypasted from C++
|
|
|
yurume
|
|
improver
|
|
|
veluca
|
2022-05-15 10:15:51
|
but apparently wrong
|
|
|
improver
|
2022-05-15 10:16:19
|
i mean kinda yeah code can be less weird at some details but lol
|
|
|
_wb_
|
2022-05-16 06:04:50
|
Yurume is single-handedly finding more spec bugs than all the national bodies of ISO combined 😅
|
|
|
Petr
|
2022-05-16 06:08:03
|
Honestly, are the national bodies really interested in finding bugs? 🙂
|
|
|
yurume
|
|
_wb_
Yurume is single-handedly finding more spec bugs than all the national bodies of ISO combined 😅
|
|
2022-05-16 06:12:37
|
I'm kinda surprised too, didn't other ISO members proofread the document and give feedback? maybe they didn't take a look at libjxl though...
|
|
|
_wb_
|
2022-05-16 06:37:42
|
They read it but they didn't try to execute it, that kind of makes a big difference
|
|
|
|
JendaLinda
|
|
_wb_
VarDCT can only operate on 3 channels
|
|
2022-05-16 08:19:16
|
Just for clarification, VarDCT always uses 3 channels for color images, but how does it handles grayscale images? Does it use a single channel?
|
|
|
|
veluca
|
2022-05-16 09:02:19
|
it uses 3 channels, two of which are 0
|
|
2022-05-16 09:07:12
|
(it took some pain to ensure they are *actually* 0)
|
|
|
_wb_
|
2022-05-16 09:07:43
|
A channel that is all-zeroes compresses to some small constant size regardless of dimensions, in both vardct and in modular
|
|
2022-05-16 09:08:16
|
So we didn't really have to bother with not encoding zero chroma, it would make very little difference
|
|
2022-05-16 09:09:33
|
(it's different in all other codecs I know, which have some nonzero fixed cost per sample so having a special case for luma-only does make sense for them)
|
|
2022-05-16 09:10:35
|
In modular mode, grayscale is encoded with just a single channel, but that's because modular is flexible in the channelcount anyway
|
|
|
|
veluca
|
2022-05-16 09:10:39
|
tbh it would make sense for us too, for memory usage 😉
|
|
|
_wb_
|
2022-05-16 09:11:23
|
Well we could special-case a chroma histogram that is a singleton and not actually allocate it in that case
|
|
|
|
JendaLinda
|
2022-05-16 09:42:35
|
Okay. I suppose the same thing happens when transcoding grayscale JPEG to VarDCT, such JPEG has only the luma channel.
In modular, I guess, indexed color could be used for grayscale as well. After all, 8 bit grayscale uses small amount of "colors".
|
|
|
_wb_
|
2022-05-16 09:49:32
|
yes, probably if you pass it a grayscale image as RGB, the palette transform will make a palette and encode it like that
|
|
2022-05-16 09:50:13
|
and otherwise YCoCg will turn it into one grayscale channel and two all-zero channels
|
|
|
|
JendaLinda
|
2022-05-16 10:26:52
|
The application can pretty easily detect a grayscale image by comparing the RGB values. After all, some image formats don't support grayscale directly and store it as indexed color.
|
|
|
Traneptora
|
2022-05-16 02:18:24
|
is it true that all-gray RGB images are always R=X, G=X, B=X though?
|
|
2022-05-16 02:18:50
|
Usually I see grayscale treated as a YUV luma plane, with U and V both zeros
|
|
2022-05-16 02:19:06
|
but depending on the 3x3 matrix, that won't necessarily translate to Cb and Cr, will it?
|
|
|
_wb_
|
2022-05-16 03:23:47
|
shouldn't every variant of yuv make u=v=0 iff r=g=b?
|
|
|
Traneptora
|
2022-05-16 03:37:37
|
I don't know
|
|
2022-05-16 03:37:44
|
that was my question
|
|
|
_wb_
|
2022-05-16 04:00:03
|
I _think_ so, but no idea if there is some exotic yuv variant that doesn't do that. It would be weird though - then yuv400 would not be grayscale in that yuv variant...
|
|
|
Traneptora
|
2022-05-16 04:06:50
|
but for all the sane ones it is that way, I see
|
|
|
|
veluca
|
2022-05-16 04:23:54
|
what's gray anyway
|
|
|
BlueSwordM
|
|
veluca
what's gray anyway
|
|
2022-05-16 04:30:52
|
Gray is the luma channel, contrast and brightness only.
|
|
|
|
veluca
|
2022-05-16 04:32:23
|
I mean what is gray *light*
|
|
2022-05-16 04:32:40
|
or... whatever
|
|
|
|
JendaLinda
|
2022-05-16 04:45:42
|
Gray is an intensity of light. It doesn't need to be visible light, it could be infrared as well. Or it doesn't have to be a picture at all. A grayscale image can represent any arbitrary rectangular array of values.
|
|
|
_wb_
|
2022-05-16 04:55:54
|
a grayscale image can be used for just any single-channel data (could be an alpha channel or whatever), but I'd say what makes a visual image 'grayscale' is that all the pixels have the same chromaticity as the WhitePoint
|
|
2022-05-16 04:57:26
|
one way to think about grayscale is that the color gamut is extremely restricted to a triangle in xy that is just a dot, i.e. the R, G and B primaries all coincide and are equal to the WhitePoint itself
|
|
2022-05-16 05:00:51
|
I guess you _could_ have a grayscale image with a WhitePoint that is actually not white at all, e.g. not even close to D65 but somewhere randomly in the xy plane. I suppose if you convert such a grayscale image to sRGB, you should get pixel values that are very much not R=G=B, but colored (but of course monochromatic)
|
|
2022-05-16 05:01:58
|
I wonder what happens if you try something like that in libjxl. Do we do whitepoint conversion?
|
|
|
|
veluca
|
2022-05-16 05:41:32
|
I believe so yes
|
|
|
_wb_
|
2022-05-16 06:20:35
|
Could be fun to make a nominally grayscale image that looks purple, could be a good test of cms being implemented properly
|
|
|
|
veluca
|
2022-05-16 06:42:28
|
I'm not sure, but I think with advanced LUT trickery you could also make a 1c image that looks 2 different colors or something
|
|
2022-05-16 06:42:41
|
(basically use the CMS to do indexed color, but continuous)
|
|
|
|
JendaLinda
|
2022-05-16 06:56:49
|
I think it would be a neat trick to add sepia tone to a plain black and white picture.
|
|
|
_wb_
|
2022-05-16 07:31:36
|
Yes with full icc you can probably do very funky stuff, I was wondering what can be done within the enum though
|
|
2022-05-16 07:32:10
|
Might be good to add some stuff to jxl_from_tree to set custom enum spaces
|
|
2022-05-16 07:32:35
|
Also for HDR jxl art and stuff :)
|
|
2022-05-16 08:12:18
|
hm, just changing the whitepoint on a grayscale image doesn't seem to make it render any differently
|
|
2022-05-16 08:12:29
|
|
|
2022-05-16 08:12:41
|
looks like just a normal grayscale gradient to me
|
|
2022-05-16 08:13:11
|
```
$ jxlinfo 0.jxl
JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale
Color space: Grayscale, Custom, white_point(x=0.100000,y=0.700000), sRGB transfer function, rendering intent: Perceptual
```
|
|
2022-05-16 08:14:58
|
<@604964375924834314> maybe I'm misunderstanding how WhitePoint is supposed to work or what it is supposed to mean. I never quite completely understood what WhitePoint is supposed to mean exactly...
|
|
|
spider-mario
|
2022-05-16 08:18:11
|
AFAIU, in grayscale, it would only matter when using the “absolute colorimetric” rendering intent
|
|
2022-05-16 08:18:44
|
perceptual and relative do chromatic adaptation from the source to the target colorspaces
|
|
2022-05-16 08:19:02
|
but in a grayscale image, everything has the chromaticity of the white point, so…
|
|
|
_wb_
|
2022-05-16 08:19:05
|
|
|
2022-05-16 08:19:29
|
I tried kAbsolute too, did the same thing
|
|
2022-05-16 08:20:37
|
Maybe when I view the decoded png my viewer just didn't bother
|
|
|
spider-mario
|
2022-05-16 08:21:11
|
that’s quite possible
|
|
2022-05-16 08:21:20
|
iirc, Skia doesn’t do absolute, for example
|
|
|
_wb_
|
2022-05-16 08:22:43
|
Question is if I set it to kAbsolute with that absurdly purple whitepoint, what should it look like and how will it look in the various jxl viewers?
|
|
|
spider-mario
|
2022-05-16 08:23:07
|
isn’t it green?
|
|
|
_wb_
|
2022-05-16 08:23:22
|
Or whatever it is, not white in any case
|
|
2022-05-16 08:23:49
|
Yeah very green
|
|
2022-05-16 08:24:29
|
In theory with kPerceptual you should be able to make sepia toned images and stuff like that?
|
|
2022-05-16 08:24:44
|
I mean kAbsolute
|
|
|
spider-mario
|
2022-05-16 08:24:50
|
yes, I think so
|
|
|
_wb_
|
2022-05-16 08:25:31
|
We should test if that works and if not, how to make it work, because that's a nifty trick
|
|
|
spider-mario
|
2022-05-16 08:25:48
|
although, if you were to convert to a colorspace with a warmer white point, the image may actually look more blue
|
|
2022-05-16 08:26:02
|
absolute colorimetric is considered kind of a niche use case
|
|
2022-05-16 08:26:10
|
most useful when simulating printing or this sort of thing
|
|
2022-05-16 08:26:19
|
(“soft proofing”)
|
|
|
_wb_
|
2022-05-16 08:27:28
|
So kPerceptual and kRelative map source wp to target wp, right?
|
|
2022-05-16 08:27:35
|
But kAbsolute doesn't
|
|
2022-05-16 08:27:47
|
What about kSaturation?
|
|
|
spider-mario
|
2022-05-16 08:30:05
|
iirc, it also does
|
|
2022-05-16 08:30:20
|
and to preserve saturation, it cares less about preserving hue
|
|
2022-05-16 08:31:20
|
it’s also a bit niche; from what I understand, the use-case that is usually mentioned is computer graphics that must keep their “pop” at the expense of accuracy
|
|
2022-05-16 08:31:58
|
https://www.cambridgeincolour.com/tutorials/color-space-conversion.htm is a nice article from what I recall
|
|
|
_wb_
|
2022-05-16 08:37:11
|
So if I understand correctly, kRelative and kAbsolute are the only ones that try to be 'correct', i.e. they clip out of gamut colors to the nearest in gamut color, and the difference is whether they adjust for WP too or not
|
|
2022-05-16 08:38:06
|
kPerceptual just scales/stretches one gamut to another, so e.g. when going from wide gamut to small gamut it will desaturate
|
|
2022-05-16 08:38:57
|
And kSaturation also stretches to fit in the target gamut, but prioritizes saturation preservation instead of hue preservation
|
|
|
spider-mario
|
2022-05-16 08:49:54
|
your and my understandings coincide
|
|
|
_wb_
|
|
|
veluca
|
2022-05-16 09:29:19
|
I like how you didn't say that it's correct xD
|
|
|
_wb_
|
2022-05-16 09:45:07
|
Who knows what is correct in all this mess where correctness is ultimately decided by what the popular tools happen to implement 😅
|
|
2022-05-17 03:04:00
|
so if I make this jxl file:
$ jxlinfo 0.jxl
JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale
Color space: Grayscale, Custom, white_point(x=0.700000,y=0.100000), sRGB transfer function, rendering intent: Absolute
|
|
2022-05-17 03:04:14
|
and djxl it to a png
|
|
2022-05-17 03:05:16
|
then the png has some synthesized icc profile:
icc©️ Copyright 2019 Google LLC, CC-BY-SA 3.0 Unported license(https://creativecommons.org/licenses/by-sa/3.0/legalcode)
icc:description: Gra_0.7;0.1_Abs_SRG
|
|
2022-05-17 03:06:31
|
but no viewer shows it in any different way than with a normal white point
|
|
2022-05-17 03:07:01
|
if I cjxl the png to make an XYB VarDCT jxl file, it also looks like just regular grayscale
|
|
2022-05-17 03:11:53
|
same if I make a color image with a funky whitepoint and kAbsolute rendering intent — the whitepoint seems to be just ignored
|
|
2022-05-17 03:18:38
|
ah no, interesting
|
|
2022-05-17 03:19:26
|
when I make an RGB image (that only contains R=G=B) with a funky white point and kAbsolute rendering intent, something is happening in some viewers
|
|
2022-05-17 03:20:03
|
left is how photoshop shows the grayscale and rgb images, on the right is how Preview shows the rgb image
|
|
2022-05-17 03:21:24
|
firefox and chrome just show the rgb image as grayscale though
|
|
2022-05-17 03:22:15
|
no idea what photoshop is doing but it looks weird to me
|
|
|
Traneptora
|
2022-05-17 04:40:37
|
`mpv` might display it properly once my patch to libavcodec gets merged
|
|
2022-05-17 04:40:39
|
we'll have to see
|
|
2022-05-17 04:40:59
|
currently I'm awaiting a different patch
|
|
|
_wb_
so if I make this jxl file:
$ jxlinfo 0.jxl
JPEG XL image, 1024x1024, (possibly) lossless, 10-bit Grayscale
Color space: Grayscale, Custom, white_point(x=0.700000,y=0.100000), sRGB transfer function, rendering intent: Absolute
|
|
2022-05-17 04:43:14
|
can you upload this file? and is it XYB encoded?
|
|
2022-05-17 04:43:19
|
or is it modular grayscale
|
|
2022-05-17 04:43:45
|
it would be a good test sample anyway
|
|
|
_wb_
|
2022-05-17 07:26:00
|
I tried just modular grayscale and modular rgb
|
|
2022-05-17 07:26:44
|
Looks like nothing I could find does white point adjustment for grayscale png images
|
|
2022-05-17 07:28:03
|
And only Preview does white point adjustment for rgb png images (maybe some others too, didn't try a lot)
|
|
2022-05-17 07:29:18
|
Photoshop does something but it looks buggy
|
|
2022-05-17 07:32:31
|
My guess is that most things just ignore whitepoint and only convert the primaries, or perhaps they don't ignore it but just always map it to the target whitepoint (which is ok for all rendering intents that are in common use)
|
|
|
190n
|
2022-05-18 01:42:00
|
<@&807636211489177661> ban
|
|
2022-05-18 01:42:17
|
looks like this person's from matrix bridge
|
|
|
_wb_
|
2022-05-18 05:03:25
|
Yeah, cannot ban without killing the matrix bridge
|
|
|
190n
|
2022-05-18 05:09:47
|
you can delete the message tho right?
|
|
2022-05-18 05:10:00
|
idk who has authority to ban that user on the matrix side (if they haven't already been banned)
|
|
|
_wb_
|
2022-05-18 05:11:59
|
Yeah I can delete the message
|
|
|
190n
|
|
_wb_
|
|
Traneptora
can you upload this file? and is it XYB encoded?
|
|
2022-05-18 07:00:32
|
here's the grayscale image with funky whitepoint
|
|
2022-05-18 07:01:03
|
and here's the rgb one (that is also grayscale in the sense that r=g=b) with the same funky whitepoint
|
|
2022-05-18 07:25:47
|
you can't make this stuff up
|
|
2022-05-18 07:26:55
|
this is how Finder and Preview show these two images (when converted to png)
|
|
2022-05-18 07:27:57
|
Finder is interesting: at first it shows the image as if it doesn't have a funky whitepoint, but when I resize the Finder window that somehow triggers it to do something else
|
|
|
Traneptora
|
2022-05-19 06:51:28
|
so this is the crash I'm getting on windows
|
|
2022-05-19 06:51:31
|
```gdb
(gdb) where
#0 0x00007ff7c21a9d3c in operator() (__closure=0x9c2c7fc8e0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881
#1 0x00007ff7c23a7a58 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x29183ce9bc0, command=command@entry=2160, thread=thread@entry=3) at ../lib/threads/thread_parallel_runner_internal.cc:139
#2 0x00007ff7c23a7b3a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x29183ce9bc0, thread=3) at ../lib/threads/thread_parallel_runner_internal.cc:169
#3 0x00007ff7c304b151 in execute_native_thread_routine ()
#4 0x00007ff7c11b44d3 in pthread_create_wrapper (args=0x29183d1e1c0) at src/thread.c:1533
#5 0x00007ffa6a0cdfb4 in msvcrt!_beginthreadex () from /c/WINDOWS/System32/msvcrt.dll
#6 0x00007ffa6a0ce08c in msvcrt!_endthreadex () from /c/WINDOWS/System32/msvcrt.dll
#7 0x00007ffa6ad954e0 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8 0x00007ffa6bd8485b in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) frame 0
#0 0x00007ff7c21a9d3c in operator() (__closure=0x9c2c7fc8e0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881
881 in /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h
(gdb) frame 1
#1 0x00007ff7c23a7a58 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x29183ce9bc0, command=command@entry=2160, thread=thread@entry=3) at ../lib/threads/thread_parallel_runner_internal.cc:139
139 self->data_func_(self->jpegxl_opaque_, task, thread);
(gdb) frame 2
#2 0x00007ff7c23a7b3a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x29183ce9bc0, thread=3) at ../lib/threads/thread_parallel_runner_internal.cc:169
169 RunRange(self, command, thread);
(gdb) frame 3
#3 0x00007ff7c304b151 in execute_native_thread_routine ()
(gdb)
```
|
|
2022-05-19 06:52:41
|
what's weird is my cpu supports AVX so I don't know what the issue is
|
|
|
_wb_
|
2022-05-19 06:55:23
|
So one hypothesis is it's a compiler bug that somehow only gets triggered in windows builds
|
|
|
Traneptora
|
2022-05-19 06:55:44
|
mingw-w64 btw
|
|
|
_wb_
|
2022-05-19 06:56:48
|
Another hypothesis is that we're using an avx2 instruction that we think does not require alignment, but it perhaps actually does require alignment, and for some reason memory is aligned when building/running in linux but not in windows
|
|
2022-05-19 06:57:52
|
Another hypothesis is that it's a bug on our end that just happens to never trigger on any CI build but does trigger in that windows build
|
|
2022-05-19 06:58:38
|
Another hypothesis is that there's something wrong with the build flags or something when building for windows
|
|
|
Traneptora
|
|
_wb_
Another hypothesis is that we're using an avx2 instruction that we think does not require alignment, but it perhaps actually does require alignment, and for some reason memory is aligned when building/running in linux but not in windows
|
|
2022-05-19 06:58:57
|
this one can easily be tested though, can it? by somehow requiring aligned memory
|
|
2022-05-19 06:59:25
|
and seeing if that fixes the problem
|
|
2022-05-19 06:59:40
|
I'm not sure how to force alligned allocs though, tbh
|
|
|
_wb_
|
2022-05-19 07:00:47
|
We have a macro for that, don't recall what it is but probably JXL_ALIGNED or HWY_ALIGNED or something
|
|
2022-05-19 07:01:05
|
The image buffer allocs we do are already aligned
|
|
2022-05-19 07:01:33
|
But things like local arrays or something may not be
|
|
|
Traneptora
|
2022-05-19 07:01:52
|
this is a client using libjxl, if I can create a `malloc` that forces alignment, I can pass it to `libjxl_memory_manager` and then test it that way too, right?
|
|
|
_wb_
|
2022-05-19 07:02:07
|
Do you also get a crash at -e 5 but not at -e 5 --gaborish=0?
|
|
|
Traneptora
|
2022-05-19 07:02:26
|
this is a libjxl client so I can't set gaborish to 0
|
|
2022-05-19 07:02:32
|
not cjxl
|
|
|
_wb_
|
2022-05-19 07:02:43
|
There's a libjxl encode option for it too I think
|
|
2022-05-19 07:02:50
|
Should be in any case
|
|
|
Traneptora
|
2022-05-19 07:05:08
|
okay, so I checked, and apparently I set the libjxl memory manager to use `av_malloc` as its malloc function, which apparently aligns *all* blocks of memory suitable for vectorization, according to the documentation
|
|
|
_wb_
|
2022-05-19 07:05:40
|
The explicit allocs are not the issue I think, it's more things like the code saying `const float foo[32] = bla` and in one build this will put it somewhere aligned and in another it may be unaligned.
|
|
2022-05-19 07:06:23
|
But doesn't hurt to check if changing the malloc to see if that helps
|
|
|
Traneptora
|
|
_wb_
We have a macro for that, don't recall what it is but probably JXL_ALIGNED or HWY_ALIGNED or something
|
|
2022-05-19 07:07:35
|
does this force proper stack alignment as well?
|
|
2022-05-19 07:07:41
|
in addition to heap memory allocations?
|
|
|
_wb_
|
2022-05-19 07:07:51
|
Iirc we already do stuff to force alignment for image buffers where we have plenty of stuff that requires alignment. I think we just allocate more than needed and round the pointer up so it is aligned.
|
|
|
Traneptora
|
2022-05-19 07:08:33
|
since the issue appears to be stack allocated memory
|
|
|
_wb_
|
2022-05-19 07:20:46
|
https://github.com/libjxl/libjxl/blob/main/lib/jxl/convolve.cc#L363 is that the spot that could be the issue, <@179701849576833024> ?
|
|
2022-05-19 07:22:10
|
What would be the way to write that without using LoadDup but with LoadU?
|
|
|
|
veluca
|
2022-05-19 07:22:20
|
It's not trivial
|
|
2022-05-19 07:23:04
|
Easier to require alignment on the weights fields and struct/variable:)
|
|
2022-05-19 07:23:27
|
Ah actually not
|
|
2022-05-19 07:23:40
|
Ugh this is a bit of a pain
|
|
2022-05-19 07:23:54
|
We *can* change the impl of loaddup though
|
|
|
_wb_
|
2022-05-19 07:25:02
|
I can't test but I think we need to urgently fix this, too many crashes reported in windows, I just saw Krita is also getting reports about it
|
|
|
|
veluca
|
2022-05-19 07:25:27
|
I'll give it a look later
|
|
2022-05-19 07:29:45
|
<@853026420792360980> can you see which of the 3 impl is selected here? https://github.com/google/highway/blob/master/hwy/ops/x86_256-inl.h#L2226
|
|
2022-05-19 07:30:51
|
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I have to say that's not very reassuring
|
|
|
Traneptora
|
|
veluca
<@853026420792360980> can you see which of the 3 impl is selected here? https://github.com/google/highway/blob/master/hwy/ops/x86_256-inl.h#L2226
|
|
2022-05-19 07:32:12
|
how would I do that?
|
|
|
|
veluca
|
|
Traneptora
|
2022-05-19 07:32:34
|
should I just `#error` all 3 with different error messages?
|
|
|
|
veluca
|
2022-05-19 07:32:37
|
one easy way is to add a `#error "impl 1"`
|
|
2022-05-19 07:32:38
|
yeeeep
|
|
2022-05-19 07:32:44
|
I mean that's what I would do
|
|
|
_wb_
|
2022-05-19 07:36:54
|
What's the purpose/reason for having 3 implementations in the first place?
|
|
|
|
veluca
|
2022-05-19 07:39:47
|
don't ask me 🙂
|
|
|
_wb_
|
|
Traneptora
mingw-w64 btw
|
|
2022-05-19 07:44:22
|
Is that clang or gcc? Usually gcc, right?
|
|
|
Traneptora
|
|
veluca
one easy way is to add a `#error "impl 1"`
|
|
2022-05-19 07:45:00
|
third one, the #else
|
|
|
|
veluca
|
2022-05-19 07:45:00
|
at least *some* build Sami got had actually msvc apparently
|
|
2022-05-19 07:45:07
|
or at least something looking very much like it
|
|
|
Traneptora
|
2022-05-19 07:45:22
|
```c
#if HWY_LOADDUP_ASM
#error "impl 1"
__m256 out;
asm("vbroadcastf128 %1, %[reg]" : [ reg ] "=x"(out) : "m"(p[0]));
return Vec256<float>{out};
#elif HWY_COMPILER_MSVC && !HWY_COMPILER_CLANG
#error "impl 2"
const __m128 v128 = LoadU(Full128<float>(), p).raw;
return Vec256<float>{
_mm256_insertf128_ps(_mm256_castps128_ps256(v128), v128, 1)};
#else
#error "impl 3"
return Vec256<float>{_mm256_broadcast_ps(reinterpret_cast<const __m128*>(p))};
#endif
```
|
|
2022-05-19 07:45:25
|
impl 3 was returned
|
|
|
|
veluca
|
2022-05-19 07:45:32
|
ok, then can you do 2 things...
(a) print the pointer 😄
(b) switch to impl 2
|
|
|
Traneptora
|
2022-05-19 07:45:50
|
print *which* pointer
|
|
|
|
veluca
|
|
Traneptora
|
|
|
veluca
|
2022-05-19 07:46:13
|
`fprintf(stderr, "%p\n", p);` should do it
|
|
|
_wb_
|
2022-05-19 07:46:36
|
%p, I didn't know that was a thing
|
|
|
|
veluca
|
2022-05-19 07:46:50
|
will do some spam possibly but only the last one before the crash matters
|
|
|
_wb_
|
2022-05-19 07:47:11
|
I always do %X or something
|
|
2022-05-19 07:48:08
|
Maybe first worth checking if switching to impl 2 actually fixes it
|
|
2022-05-19 07:48:29
|
How do we know LoadDup is even the culprit?
|
|
2022-05-19 07:49:56
|
Is it because the remaining stuff in the enc gaborish convolution is unlikely to be broken because if it were, a lot more stuff would be broken too?
|
|
|
|
veluca
|
2022-05-19 07:50:19
|
well, there were some stack traces failing on vbroadcastf128
|
|
2022-05-19 07:50:31
|
which is pretty much what LoadDup *is*
|
|
|
_wb_
|
2022-05-19 07:50:39
|
Ah I see
|
|
2022-05-19 07:50:48
|
Well, impl 1 and 3 are that
|
|
2022-05-19 07:51:42
|
LoadDup is not in the hot loop, right?
|
|
2022-05-19 07:51:52
|
Do we use LoadDup in any hot loop?
|
|
|
|
veluca
|
2022-05-19 07:52:11
|
it loads some xyb constants and other similar things
|
|
|
_wb_
|
2022-05-19 07:52:38
|
If not, can we just not bother with speed of that initialization? There is no way it will make any difference
|
|
2022-05-19 07:56:45
|
So if the crashing pointer happens to be unaligned, what we are dealing with then is an avx2 documentation bug, right? Since officially the instruction does not require alignment but in reality it seems like it does.
|
|
|
|
veluca
|
2022-05-19 07:58:55
|
I think so yes
|
|
2022-05-19 07:59:29
|
but it might be *very* unaligned, like not aligned to float width, which is documented as a crashing condition (and probably a compiler bug)
|
|
|
_wb_
|
2022-05-19 08:03:46
|
https://c.tenor.com/GFSbcD9PgycAAAAM/drumroll-drum.gif
|
|
|
Traneptora
|
|
veluca
ok, then can you do 2 things...
(a) print the pointer 😄
(b) switch to impl 2
|
|
2022-05-19 08:03:49
|
how would I switch to impl 2 anyway? just remove the #else, replace the #elif foo with #else?
|
|
|
_wb_
|
2022-05-19 08:04:10
|
Yes, but first print the pointer pls
|
|
|
Traneptora
|
2022-05-19 08:04:21
|
currently rebuilding
|
|
|
_wb_
|
2022-05-19 08:04:35
|
I'm curious what its last digits are...
|
|
|
Traneptora
|
2022-05-19 08:04:40
|
I also had to #include cstdio but otherwise yes it worked
|
|
2022-05-19 08:04:57
|
i.e. it compiled
|
|
|
_wb_
|
2022-05-19 08:05:30
|
Could of course be that adding the print messes things up and makes it work, lol
|
|
|
Traneptora
|
2022-05-19 08:05:37
|
no, I mean it compiled
|
|
2022-05-19 08:05:39
|
I haven't run it yet
|
|
|
|
veluca
|
2022-05-19 08:05:44
|
wouldn't be the first time, and won't be the last
|
|
|
Traneptora
|
2022-05-19 08:06:03
|
this is all cross-compiling inside a docker image so it's all very slow
|
|
|
_wb_
|
2022-05-19 08:06:55
|
Yeah and changing that file probably causes a near full recompile
|
|
|
Traneptora
|
2022-05-19 08:07:28
|
and then FFmpeg has to recompile too
|
|
2022-05-19 08:07:45
|
and then to actually test it I gotta reboot to my windows install
|
|
2022-05-19 08:07:48
|
yuck
|
|
2022-05-19 08:08:02
|
I can test it in wine but I'd prefer a native install for true diagnostic info
|
|
|
|
veluca
|
2022-05-19 08:12:47
|
not the fastest build-test cycle ever lol
|
|
|
Traneptora
|
2022-05-19 08:13:53
|
testing it in wine, it's spitting out *lots* of pointers
|
|
2022-05-19 08:14:12
|
god this is slow
|
|
2022-05-19 08:14:18
|
time to reboot and test natively
|
|
|
_wb_
|
2022-05-19 08:17:30
|
Can you do | tail or something?
|
|
|
Traneptora
|
2022-05-19 08:17:47
|
```
00007ff670510980
00007ff670510920
00007ff670510970
Thread 31 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 13140.0x3384]
0x00007ff66ebaa7dc in operator() (__closure=0xca469fcd90, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881
881 /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h: No such file or directory.
(gdb) where
#0 0x00007ff66ebaa7dc in operator() (__closure=0xca469fcd90, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881
#1 0x00007ff66edab8b8 in jpegxl::ThreadParallelRunner::RunRange (self=self@entry=0x2484eacc680, command=command@entry=2160, thread=thread@entry=1) at ../lib/threads/thread_parallel_runner_internal.cc:139
#2 0x00007ff66edab99a in jpegxl::ThreadParallelRunner::ThreadFunc (self=0x2484eacc680, thread=1) at ../lib/threads/thread_parallel_runner_internal.cc:169
#3 0x00007ff66fa53b51 in execute_native_thread_routine ()
#4 0x00007ff66dbb44d3 in pthread_create_wrapper (args=0x2484eaccd40) at src/thread.c:1533
#5 0x00007ffe7b23dfb4 in msvcrt!_beginthreadex () from /c/WINDOWS/System32/msvcrt.dll
#6 0x00007ffe7b23e08c in msvcrt!_endthreadex () from /c/WINDOWS/System32/msvcrt.dll
#7 0x00007ffe7ba954e0 in KERNEL32!BaseThreadInitThunk () from /c/WINDOWS/System32/KERNEL32.DLL
#8 0x00007ffe7d06485b in ntdll!RtlUserThreadStart () from /c/WINDOWS/SYSTEM32/ntdll.dll
#9 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
```
|
|
2022-05-19 08:17:53
|
ran it inside gdb
|
|
2022-05-19 08:18:13
|
lemme try with threads=1
|
|
|
_wb_
|
2022-05-19 08:18:52
|
So
|
|
2022-05-19 08:18:56
|
70
|
|
|
Traneptora
|
2022-05-19 08:18:58
|
It's not printing anything with threads=1
|
|
|
_wb_
|
2022-05-19 08:19:20
|
Means it is aligned to 16 bytes but not to 32 bytes, right?
|
|
|
Traneptora
|
2022-05-19 08:19:21
|
well, threads=0
|
|
2022-05-19 08:19:45
|
now it prints nothing
|
|
2022-05-19 08:20:28
|
disabling threading makes it easier to hit
|
|
|
_wb_
|
2022-05-19 08:20:49
|
The previous pointers are aligned to 32 but that one only to 16
|
|
|
Traneptora
|
2022-05-19 08:20:55
|
```
(gdb) where
#0 0x00007ff66ebaa7dc in operator() (__closure=0x7b0f7fcda0, task=<optimized out>) at /opt/ct-ng/lib/gcc/x86_64-w64-mingw32/11.2.0/include/avxintrin.h:881
#1 0x00007ff66edab6de in jpegxl::ThreadParallelRunner::Runner (runner_opaque=0x203175792c0, jpegxl_opaque=0x7b0f7fcd90, init=<optimized out>,
func=0x7ff66ebaa910 <jxl::ThreadPool::RunCallState<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> >::CallDataFunc(void *, uint32_t, size_t)>, start_range=0,
end_range=2160) at ../lib/threads/thread_parallel_runner_internal.cc:72
#2 0x00007ff66eba87f3 in jxl::ThreadPool::Run<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> > (begin=0, init_func=
@0x7ff66f7cfe30: {jxl::Status (unsigned long long)} 0x7ff66f7cfe30 <jxl::ThreadPool::NoInit(unsigned long long)>, caller=<synthetic pointer>, data_func=..., end=<optimized out>, this=0x203175bbd00) at ../lib/jxl/base/data_parallel.h:50
#3 jxl::RunOnPool<jxl::Status(long long unsigned int), jxl::N_AVX2::SRGBToXYB(const Image3F&, float const*, jxl::ThreadPool*, jxl::Image3F*)::<lambda(uint32_t, size_t)> > (begin=0,
init_func=@0x7ff66f7cfe30: {jxl::Status (unsigned long long)} 0x7ff66f7cfe30 <jxl::ThreadPool::NoInit(unsigned long long)>, caller=<synthetic pointer>, data_func=..., end=<optimized out>, pool=0x203175bbd00) at ../lib/jxl/base/data_parallel.h:108
#4 jxl::N_AVX2::SRGBToXYB (srgb=..., premul_absorb=<optimized out>, premul_absorb@entry=0x7b0f7fce60, pool=pool@entry=0x203175bbd00, xyb=<optimized out>, xyb@entry=0x7b0f7fd700) at ../lib/jxl/enc_xyb.cc:195
```
|
|
|
|
veluca
|
2022-05-19 08:21:26
|
apparently that's indeed the case
|
|
2022-05-19 08:21:39
|
there's also a way to get the local asm
|
|
2022-05-19 08:21:41
|
one sec
|
|
|
Traneptora
|
2022-05-19 08:22:06
|
if I turn off threading it prints nothing
|
|
2022-05-19 08:22:20
|
ever
|
|
|
|
veluca
|
2022-05-19 08:22:23
|
disassemble
|
|
2022-05-19 08:22:31
|
ok that is *weird*
|
|
|
veluca
disassemble
|
|
2022-05-19 08:22:56
|
I mean that's a gdb command
|
|
|
Traneptora
|
2022-05-19 08:22:57
|
actually wait
|
|
2022-05-19 08:23:04
|
oh, the gdb command prints nothing
|
|
2022-05-19 08:23:11
|
it's printing stuff fine if I disable threading
|
|
|
|
veluca
|
2022-05-19 08:23:23
|
could be gdb + threads is broken
|
|
|
Traneptora
|
2022-05-19 08:23:23
|
god, it's a heisenbug
|
|
|
|
veluca
|
2022-05-19 08:23:38
|
well if it's really due to alignment then yep indeed
|
|
|
Traneptora
|
2022-05-19 08:24:04
|
it appears it doesn't need 32-bit alignment though
|
|
2022-05-19 08:24:09
|
```
00007ff6705108f0
00007ff670510940
00007ff6705108e0
00007ff670510980
00007ff670510920
00007ff670510970
00007ff670510910
00007ff670510960
00007ff670510900
00007ff670510950```
|
|
2022-05-19 08:24:11
|
this got printed
|
|
|
|
veluca
|
2022-05-19 08:25:14
|
mhhhhh
|
|
|
_wb_
|
2022-05-19 08:25:33
|
Does it ever crash just after a 32 aligned pointer?
|
|
|
|
veluca
|
2022-05-19 08:26:02
|
maybe we hit a stack size limit? that'd be rather odd though
|
|
|
_wb_
|
2022-05-19 08:26:29
|
Could it be that alignment is needed but it sometimes works when not aligned?
|
|
|
Traneptora
|
2022-05-19 08:26:35
|
https://0x0.st/oaa9.log
|
|
2022-05-19 08:27:18
|
does this mean anything to you?
|
|
|
|
veluca
|
2022-05-19 08:28:09
|
... looks like it's not on failing on a loaddup
|
|
2022-05-19 08:28:37
|
but on a Load()
|
|
2022-05-19 08:28:58
|
well, a store in this case
|
|
|
_wb_
|
2022-05-19 08:29:33
|
So in the convolve loop?
|
|
|
|
veluca
|
2022-05-19 08:30:14
|
no it's in SRGBToXYB now
|
|
2022-05-19 08:30:27
|
what.
|
|
|
_wb_
|
2022-05-19 08:30:42
|
That is weird
|
|
|
Traneptora
|
2022-05-19 08:30:53
|
still AVX2 though
|
|
2022-05-19 08:30:55
|
are there any helpful options I can pass to highway to help debug this?
|
|
|
|
veluca
|
2022-05-19 08:30:57
|
that looks *suspiciously* like argument passing under some calling convention
|
|
2022-05-19 08:31:01
|
give me a sec
|
|
|
_wb_
|
2022-05-19 08:31:03
|
That is also used at e < 5
|
|
|
Traneptora
|
2022-05-19 08:31:34
|
currently it's using e7
|
|
2022-05-19 08:31:46
|
e3 produces an identical crash
|
|
2022-05-19 08:32:05
|
In the same SRGBToXYZ function
|
|
2022-05-19 08:32:12
|
which is not surprising, if that's where it is
|
|
|
|
veluca
|
2022-05-19 08:32:21
|
I think this might be *another* problem
|
|
|
Traneptora
|
2022-05-19 08:33:06
|
you mentioned that `HWY_ALIGN` can force alignment, but how I can enable that option? running libjxl's cmake with `-DHWY_ALIGN=ON` just yields a warning that it is unused
|
|
|
|
veluca
|
2022-05-19 08:33:34
|
no that would be a C++ macro
|
|
2022-05-19 08:33:53
|
can you print %rsp in gdb?
|
|
|
_wb_
|
2022-05-19 08:34:00
|
Yeah no that's something we would need to use in the code on whatever variable needs to be aligned
|
|
2022-05-19 08:35:01
|
If this issue is not in gaborish it is something completely different
|
|
|
Traneptora
|
|
veluca
can you print %rsp in gdb?
|
|
2022-05-19 08:36:27
|
how?
|
|
|
|
veluca
|
2022-05-19 08:37:31
|
print %rsp
|
|
2022-05-19 08:37:34
|
I think
|
|
|
Traneptora
|
2022-05-19 08:37:57
|
```
(gdb) print %rsp
A syntax error in expression, near `%rsp'.
```
|
|
|
|
veluca
|
2022-05-19 08:38:13
|
whatever
|
|
2022-05-19 08:38:15
|
`info registers`
|
|
2022-05-19 08:38:30
|
let's go as verbose as we can xD
|
|
|
Traneptora
|
2022-05-19 08:38:40
|
```
(gdb) info registers
rax 0x8a529fc710 594091689744
rbx 0x24e8b310d80 2536365952384
rcx 0x8a529fc750 594091689808
rdx 0x8a529fc730 594091689776
rsi 0x8a529fc730 594091689776
rdi 0x8a529fc750 594091689808
rbp 0x24e972efd00 0x24e972efd00
rsp 0x8a529fc650 0x8a529fc650
r8 0x0 0
r9 0x7ff66ebaa910 140696396409104
r10 0x8a529fc8b0 594091690160
r11 0x24e8d301c80 2536399445120
r12 0x24e952efc80 2536533589120
r13 0x0 0
r14 0x8a529fc8c0 594091690176
r15 0x24e8d301c80 2536399445120
rip 0x7ff66ebaa7dc 0x7ff66ebaa7dc <operator()(uint32_t, size_t) const+188>
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x53 83
gs 0x2b 43
(gdb)
```
|
|
|
|
veluca
|
2022-05-19 08:38:57
|
well that's an interesting stack pointer
|
|
|
Traneptora
|
2022-05-19 08:39:14
|
I had to run it again btw, so the virtual address space might be a bit different
|
|
2022-05-19 08:39:22
|
should I repost the disassemble log?
|
|
|
|
veluca
|
2022-05-19 08:39:40
|
nah
|
|
2022-05-19 08:39:43
|
rsp is not aligned
|
|
2022-05-19 08:39:46
|
that's not great
|
|
|
_wb_
|
2022-05-19 08:39:55
|
Can the stack pointer be messed up causing gdb to think it is in a different function?
|
|
|
Traneptora
|
2022-05-19 08:40:02
|
looks 16-bit aligned, yea
|
|
2022-05-19 08:40:39
|
okay, interesting. sometimes, it crashes immediately, and sometimes it starts spitting out the printf %ps
|
|
2022-05-19 08:40:50
|
I'm wondering if it's random based on if the pointer happens to be aligned or not
|
|
|
|
veluca
|
2022-05-19 08:40:57
|
almost certain
|
|
|
_wb_
|
2022-05-19 08:41:00
|
Then I go back to the assumption that unaligned LoadDup is just causing random trouble
|
|
|
|
veluca
|
2022-05-19 08:41:13
|
I'm going to bet it depends on whether rsp is 16-bit aligned or 32-bit aligned
|
|
|
Traneptora
|
2022-05-19 08:41:30
|
is there a way to *force* it to be 32-bit aligned, perhaps via compile flag?
|
|
|
|
veluca
|
2022-05-19 08:41:37
|
trying to figure it out
|
|
|
Traneptora
|
2022-05-19 08:41:40
|
I could try using `-mstack-align` or `-mdata-align` if that is necessary
|
|
2022-05-19 08:41:44
|
I don't know if those work on x86 though
|
|
|
|
veluca
|
2022-05-19 08:42:10
|
what compiler are you using for this build?
|
|
|
Traneptora
|
2022-05-19 08:43:32
|
mingw-w64
|
|
2022-05-19 08:43:39
|
which would be gcc
|
|
|
_wb_
|
2022-05-19 08:43:41
|
I guess it would be useful to try if switching to impl 2 makes it work, to verify that that is indeed the issue
|
|
|
|
veluca
|
2022-05-19 08:43:49
|
no, this is unrelated
|
|
2022-05-19 08:44:18
|
this is a compiler-generated register spill / parameter passing
|
|
2022-05-19 08:44:57
|
one shouldn't do an unaligned store to 0xe0(%rsp) if rsp is 16-bit aligned 🙂
|
|