JPEG XL

Specification issues

2001 - 2110

1501 - 2000

yurume

2022-09-18 10:42:26	JPEG XL spec bugs, issues and questions.
2022-09-18 10:44:06	I'd like to post any recently discovered spec bug to here, since it is not really about J40
2022-09-18 10:44:56	negative special distance during LZ77: ```c int32_t special = (int32_t) SPECIAL_DISTANCES[distance]; J40__ASSERT(dist_mult <= J40__MAX_DIST_MULT); // TODO spec bug: distance can be as low as -6 when dist_mult = 1 and distance = // dist_mult * 1 - 7; libjxl clamps it to the minimum of 1, so we do the same here distance = j40__max32(1, ((special >> 4) - 7) + dist_mult * (special & 7)); ```
2022-09-18 10:45:54	a related edge case where distance is zero at the very first symbol: ```c distance = j40__min32(j40__min32(distance, code->num_decoded), 1 << 20); code->copy_pos = code->num_decoded - distance; if (J40_UNLIKELY(distance == 0)) { // TODO spec bug: this is possible when num_decoded == 0 (or a non-positive special // distance, handled above) and libjxl acts as if `window[i]` is initially filled with 0 J40__ASSERT(code->num_decoded == 0 && !code->window); code->window = j40__calloc(1 << 20u, sizeof(int32_t)); if (!code->window) return J40__ERR("!mem"), 0; } ```
2022-09-19 12:37:09	I pondered about these two
2022-09-19 12:38:12	`width * height` here refers to "the largest dimensions (horizontally and vertically, respectively) of the image and any of its frames, in pixels", but splines and patches are per-frame, so this means that checking for those limits can only be done when every frame is read (!)
2022-09-19 12:40:21	for example in 100x100 image the first frame can have 100x100 dimensions and 2501 splines. this can be valid if the second frame has, say, 200x200 dimensions which later get cropped.
2022-09-19 12:41:32	similarly, ICC `output_size` is distinct from `enc_size`, and it is really inconvenient (if not impossible) to estimate the minimum possible `output_size` from `enc_size` AFAIK
2022-09-19 12:42:24	libjxl at least assumes that each command emits at least one byte, I'm yet to validate or invalidate this assumption
2022-09-19 03:14:04	I've analyzed the specification and yes, this assumption is invalid
2022-09-19 03:15:11	specifically main command byte 1 through 4 reads the length of output bytes `num` from the command stream which is allowed to be zero, so it is possible to artifically inflate `enc_size` without altering `output_size`
2022-09-19 04:27:05	fortunately libjxl doesn't seem to emit any zero-sized command, so I think we can simply forbid zero-sized command, and note that every command generates at least one output byte, which should be enough hint for implementers
2022-09-20 02:49:13	this possibility of ICC codec DoS has been reported to the issue tracker maintained by <@794205442175402004>.

_wb_

2022-09-20 05:15:23	For the splines and patches limits, we probably need to make it refer to the frame they're in
2022-09-20 05:16:37	Libjxl also has a limit on the total pixel area of the splines
2022-09-20 05:17:34	we may want to add such a limit to the profile too

yurume

	_wb_ For the splines and patches limits, we probably need to make it refer to the frame they're in
2022-09-20 05:20:58	yeah, that makes an intuitive sense, I can see some cases where a smaller frame wants to complement prior frames and thus can have more splines than the individual frame needs, but taking account for image dimensions seems enough for that.

Traneptora

2022-09-25 04:16:02	1. when calculating `num_groups`, one must divide `ceil(width / group_dim)`, using `width` in the frame header. However, if `lf_level != 0`, then one must set `width = ceil(width / (1 << (3 * lf_level))` before making this `num_groups` calculation. (This is the same for height.) This is in the spec, but since it's mentioned after `num_groups` is defined, it's very confusing.
2022-09-25 04:16:50	2. The same is also true if `upsampling != 1`. If `upsampling > 1` then you must divide `width = ceil(width / upsampling)` before calculating `num_groups`. Do note that upsampling is present provided that `!(flags & kUseLfFrame)`, but the spec doesn't actually say that this flag must be enabled if `frame_type == kLFFrame`. If so, it should say so. Since you call `ceil` first, this is not commutative either.
2022-09-25 04:18:10	3. The condition for `save_before_ct` is incorrect. The spec says that it's signalled provided that `!all_default && frame_type != kLFFrame`, with a default value `d_sbct` defined later in the spec. The actual condition for `save_before_ct` being signaled is ``` !all_default && (frame_type == kFrameReferenceOnly \|\| full_frame && (frame_type == kFrameRegular \|\| frame_type == kFrameSkipProgressive) && (frame_duration == 0 \|\| save_as_ref != 0) && !is_last && blend_mode == kReplace) ``` based on libjxl code.
2022-09-25 04:18:48	4. The spec should be clear that if `num_toc_entires > 1`, then all the TOC entries listed there are present but the size may be zero for some of them, e.g. in the case of `encoding == kModular`. This is technically stated but it's a bit confusing IMO.
2022-09-25 04:20:15	5. The spec says this (F.3.3) > Before decoding the TOC, the decoder invokes `ZeroPadToByte()` (B.2.7). It's not entirely clear what "before decoding the TOC means" here. The actual behavior is that `permuted_toc = Bool()` is read immediately after the frame header, and then if it is nonzero, the TOC permutation is read immediately after `permuted_toc`, then `ZeroPadToByte()` is invoked, then the TOC lengths are each read, and then `ZeroPadToByte()` is invoked again after they have all been read. This should be more explicit.
2022-09-25 04:20:39	6. `[0, num_clusters - 1)` appears twice in section C.2.1. It should be `[0, num_clusters)`
2022-09-25 04:21:36	7. Alias Mapping section C2.2, provides this line of code: ```cpp if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i); ``` this is incorrect. The correct line here is: ```cpp if (cutoffs[i] > bucket_size) overfull.push_back(i); else if (cutoffs[i] < bucket_size) underfull.push_back(i); ```
2022-09-25 04:22:19	8. Alias Mapping section C.2.2 is missing an extra loop: ```cpp for (i = alphabet_size; i < table_size; i++) { underfull.push_back(i); } while(overfull.size() > 0) { // right before this line ``` This same loop should also zero out `cutoffs[i]` if we're not assuming it's initially zero.
2022-09-25 04:23:15	9. Alias Mapping is missing a comment that says underfull never runs out: ```cpp while (overfull.size() > 0) { /* underfull.size() > 0 */ o = overfull.back(); u = underfull.back(); overfull.pop_back(); underfull.pop_back(); ``` While this is impossible if the code is bug-free, I ran into this issue when I had bugs earlier in the spec, so it's not a bad idea to include it as a sanity thing.
2022-09-25 04:24:21	10. Section C.2.2. defines AliasMapping for a specific distribution as `AliasMapping(x)`, a function of one variable, and yet later on in Section C.2.3, you call this: ```cpp (symbol, offset) = AliasMapping(D, index); state = D[symbol] × (state >> 12) + offset; ``` It's probably a bit clearer if you use `D.AliasMapping(index)`, i.e.: ```cpp (symbol, offset) = D.AliasMapping(index); state = D[symbol] * (state >> 12) + offset; ```
2022-09-25 04:25:12	11. In section C.2.6, you define `lsb = token & ((1 << config.lsb_in_token) - 1);`. I'd probably rename this variable to `low` since I think `lsb` gives the wrong implication of what this is supposed to do.
2022-09-25 04:25:18	12. Section 2.4 doesn't define `alphabet_size` for simple distributions, only for flat and for complex distributions. Since `alphabet_size` can be less than `table_size` this is important.
2022-09-25 04:25:49	These are all things I found before, but I'm reposting them from <#794206170445119489> here for clarity.
2022-10-24 03:57:05	Just want to double check, Extension lengths are all read before any extension payloads are read, in the extensions bundle, right?

yurume

2022-10-24 04:03:46

to my knowledge yes.

Traneptora

2022-10-24 03:52:55	I'm reading through the new spec, and here's the issues I've found so far
2022-10-24 03:59:52	(1) `alphabet_size` is not defined for simple distributions in section C.2.5. It's read for flat and for complex distributions. If it's supposed to be equal to `table_size` in the case of a simple distribution, then it should say so. (2) There's still a bug in the pseudocode in section C.2.6: ``` if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i); ``` This should read: ``` if (cutoffs[i] > bucket_size) overfull.push_back(i); else if (cutoffs[i] < bucket_size) underfull.push_back(i); ``` `underfull` shouldn't be filled if `cutoffs[i]` exactly hits the bucket size. (3) Section C.2.1 includes two cases of `[0, num_clusters - 1)` but it should say either `[0, num_clusters)` or `[0, num_clusters - 1]`.
2022-10-24 04:01:56	(4) Section D.2, SizeHeader description, contains a typo. It says `!div` where it should say `!div8`

_wb_

2022-10-24 04:02:47

can someone double-check these proposed fixes so we don't accidentally introduce new bugs?

Traneptora

2022-10-24 04:03:03	Sure, I encourage a double-check
2022-10-24 04:03:17	although some of these are just typos
2022-10-24 04:05:05	Also, there's still some c-isms in the code such as `!ratio` instead of `ratio == 0`
2022-10-24 04:05:18	since you changed `&&` to `and` I'm not sure this is desired
2022-10-24 04:05:33	worth considering it
2022-10-24 04:07:44	Section E.2 has a formatting issue:
2022-10-24 04:07:52
2022-10-24 04:07:57	look at the ovlerap on the last row
2022-10-24 04:08:09	I suggest moving the `type` column leftward as there is enough empty space
2022-10-24 04:13:00	Section E.4.3 his missing some code tags. there's a `min(128, output_size)`, a `(p + e) & 255`, and an `i == j` that are in in textrm.

_wb_

2022-10-24 04:13:05

c-isms are fine, the only reason I changed `&&` to `and` is that it caused trouble when using pandoc to convert our tex sources to docx (which is the format ISO insists on, frustratingly), since pandoc misinterpreted it as two column separators

Traneptora

2022-10-24 04:13:13

_wb_

2022-10-24 04:14:51

formatting issues are caused for the same reason, I had to drop the manual column width adjustments because they caused tables not to get parsed correctly anymore... i'll have to figure out a way to make it work in both latex and pandoc

Traneptora

2022-10-24 04:15:14

I see. These aren't really major issues, I'm just pointing out what I see when I see it

_wb_

2022-10-24 04:15:27

yeah don't stop doing that, it is useful 🙂

Traneptora

2022-10-24 04:16:55	The spec doesn't appear to specify that `flats & kUseLfFrame` must be set if `frame_type == kLFFrame`
2022-10-24 04:17:21	Is this a requirement?

_wb_

2022-10-24 04:18:25

no it isn't: that flag means the _current_ frame gets its DC from some previous LfFrame

Traneptora

2022-10-24 04:18:29

I see

_wb_

2022-10-24 04:21:27

so typically if there is no recursive DC but just progressive DC encoded using modular+squeeze, you'll have first a modular frame with frame_type == kLfFrame (which doesn't even use vardct so kUseLfFrame is meaningless), and then a vardct frame with frame_type == kRegularFrame and flags & kUseLfFrame set (which means there is no DC data in the frame itself, but the previous kLfFrame gets used as DC instead)

2022-10-24 04:22:03

it's somewhat confusing and probably a NOTE explaining this would be good

Traneptora

2022-10-24 04:22:15	If `upsampling > 1` then you must divide `width = ceil(width / upsampling)` before calculating `width = ceil(width / (1 << 3 * lf_level))`, which in turn must be done before calculating `num_groups`. Both these can be present according to the spec. These operations do not commute because of the intervening `ceil` but the spec doesn't say that you have to do it in that order. (I figured it out from inspecting libjxl code).
2022-10-24 04:24:01	Upsampling is signaled provided that `!(flags & kUseLfFrame)` which will probably be the case for `frame_type == kLfFrame`
2022-10-24 04:24:39	`lf_level` is signaled as `U32(1, 2, 3, 4)` which seems like a roundabout way of saying `1 + u(2)`
2022-10-24 04:26:14	Same with `BlendingInfo.source` which is signaled as `U32(0, 1, 2, 3)` in the spec. why not just `u(2)`?
2022-10-24 04:28:33	Section F.3.3 specifies: > Before decoding the TOC, the decoder invokes ZeroPadToByte() (B.2.7). I'd recommend, for clarity, writing "Before decoding the TOC (but after decoding the TOC permutation, if present), the decoder invokes `ZeroPadToByte()`."
2022-10-24 04:30:19	Section G.2.3 has a formatting issue with `ceil(height/8)` being in textrm instead of in code
2022-10-24 04:30:23	same with the width
2022-10-24 04:30:39	This also occurs in section G.2.4 several times.
2022-10-24 04:31:11	`log2` should be code in section G.3.3, along with two `< 3`s for hshift and vshift.
2022-10-24 04:31:55	at the end of the second-to-last paragraph of section H.1, there's a `-1` that's done in textmode, not mathmode, so the `-` looks like a hyphen, not a negative one.
2022-10-24 04:35:22	I haven't attempted to try to skim past that but I'll point anything out that I see when I continue to write jxlatte

_wb_

Traneptora (1) `alphabet_size` is not defined for simple distributions in section C.2.5. It's read for flat and for complex distributions. If it's supposed to be equal to `table_size` in the case of a simple distribution, then it should say so. (2) There's still a bug in the pseudocode in section C.2.6: ``` if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i); ``` This should read: ``` if (cutoffs[i] > bucket_size) overfull.push_back(i); else if (cutoffs[i] < bucket_size) underfull.push_back(i); ``` `underfull` shouldn't be filled if `cutoffs[i]` exactly hits the bucket size. (3) Section C.2.1 includes two cases of `[0, num_clusters - 1)` but it should say either `[0, num_clusters)` or `[0, num_clusters - 1]`.

2022-10-24 04:39:17

why does `alphabet_size` need to be defined in that case? that variable is not referenced then, or is it?

Traneptora

	_wb_ why does `alphabet_size` need to be defined in that case? that variable is not referenced then, or is it?
2022-10-24 08:08:58	it's used when you generate the alias map
2022-10-24 08:12:03	oh, right. I see, because you define `D[i]` outside of alphabetSize to be zero.
2022-10-24 08:12:08	Okay, I see, that makes sense now.
2022-10-25 07:04:52	Section H.2 mentions this right at the end:
2022-10-25 07:04:53	> The decoder then starts an ANS stream (C.1)
2022-10-25 07:05:07	does it actually have to be ANS? or can it be a prefix-code distribution?
2022-10-25 07:54:44	Section H.6.3 Squeeze transform appears to have an etraneous `-1` in the psuedo-code:
2022-10-25 07:54:48	> `for (i = 0; i < sp.size() - 1; i++) {`
2022-10-25 07:54:57	I believe this should say `i < sp.size();`
2022-10-25 07:55:50	otherwise the final element of `sp` doesn't do anything

_wb_

2022-10-25 08:08:41	Right.
	Traneptora does it actually have to be ANS? or can it be a prefix-code distribution?
2022-10-25 08:09:11	It can be either. Maybe we need a word for that...

Traneptora

	_wb_ It can be either. Maybe we need a word for that...
2022-10-25 10:49:16	Entropy-coded stream?
2022-10-25 10:49:28	Section C.1 is linked anyway

_wb_

2022-10-26 09:23:59

<@795684063032901642>

Traneptora

2022-10-27 06:38:50	Section H.5.2, top of page 52: `4 + (maxweight × ((1 << 24) Idiv ((err_sum >> shift) + 1))) >> shift;` this is missing parentheses, should be: `4 + ((maxweight × ((1 << 24) Idiv ((err_sum >> shift) + 1))) >> shift);`
2022-10-27 06:39:14	`+` is grabbier than `>>` in C.
2022-10-27 06:39:25	aka it has higher precedence
2022-10-27 06:41:37	Same section, same page: `subpred[3] = N3 - (((true_err_NW × wp_p3a + true_err_N × wp_p3b + true_err_NE × wp_p3c + (NN3 - N3) × wp_p3d + (NW3 - W3) × wp_p3e) >> 5);`
2022-10-27 06:41:46	this has an extra opening `(` right after `N3 -`

_wb_

2022-10-27 08:10:18	we will probably switch from LaTeX to this: https://www.metanorma.org/
2022-10-27 08:12:40	which might introduce some new spec bugs but it will give us a better way to produce Word files the way ISO wants them while still having a text-based spec with version control so we don't have to deal with Word to do the editing

improver

2022-10-27 08:20:58

looks good & designed exactly for this purpose

Traneptora

2022-10-27 09:26:46	section H.5.1 in the definition of `err[i]` the parentheses are misplcaced
2022-10-27 09:27:03	It says
2022-10-27 09:27:05	`abs((subpred[i] - (true_value << 3)) + 3) >> 3`
2022-10-27 09:27:07	it should be
2022-10-27 09:27:31	`(abs(subpred[i] - (true_value << 3)) + 3) >> 3`
2022-10-27 09:27:46	that is, you are supposed to add 3 after taking the absolute value of the difference
2022-10-27 11:58:55	Section J.1, it states the condition for `gab_custom` to be signaled is `!all_default && gab`
2022-10-27 11:59:05	this is actually not true, it's just `gab`
2022-10-27 11:59:21	so `gab_custom` will be signaled when `all_default == true`

veluca

2022-10-27 12:04:20

🤦‍♂️

_wb_

2022-10-28 09:43:37

if anyone feels like comparing this metanorma version of the spec (in html) to the latex version (in pdf) to see if there's anything that got lost in translation, feel free!

Traneptora

2022-10-28 11:07:29	I dislike Cambria as a font choice but otherwise I don't notice anything immediate
2022-10-28 11:10:04	Table H.2 could use better rendering
2022-10-28 11:10:18	as it's not really a table

_wb_

2022-10-28 11:14:47	Haha yes Cambria is a stupid font but what makes you think we have any choice in that? 😜
2022-10-28 11:15:35	ISO forces us to use Cambria for the main text and Courier New for the source code snippets, even though both of those are ugly and there are nicer fonts

improver

2022-10-28 03:47:55	can doc tooling do different versions with different fonts?
2022-10-28 03:48:22	jxl spec the community edition (good fonts) vs iso edition (cambria)

_wb_

2022-10-28 05:06:39	haha sure
2022-10-28 07:03:46	I migrated part 2 to asciidoc too (until now we just used Word for that since the pain was doable, it's a smaller spec)
2022-10-28 07:04:13	and then I made the above part 1+2 community edition 🙂
	Traneptora so `gab_custom` will be signaled when `all_default == true`
2022-10-28 07:25:28	damn we missed an opportunity to save 1 header bit!

Traneptora

2022-10-28 07:50:48

It won't end up mattering without a permuted TOC as it gets eaten by ZeroPadToByte

_wb_

2022-10-28 09:53:16

current versions of part 1 and part 2 drafts for 2nd edition

Traneptora

2022-10-29 09:30:51	Section K.3.1:
2022-10-29 09:31:04	> `blending`: arrays of `count` blend mode information structures, which consists of arrays of mode, alpha_channel and clamp
2022-10-29 09:31:15	Later, when blending is populated, the loop looks like this:
2022-10-29 09:31:29	> `for (k = 0; k < num_extra_channels+1; k++) {`
2022-10-29 09:31:40	These numbers aren't necessarily the same.
2022-10-29 09:32:01	How big is `blending[]`? is it one per `count` or is it one per `num_extra_channels + 1`
2022-10-30 02:32:22	The spec doesn't say what order the passgroups are in
2022-10-30 02:32:28	do you let `pass = 0` and then do all the groups?
2022-10-30 02:32:34	and then let `pass = 1` and do all the groups?
2022-10-30 02:32:34	etc.
2022-10-30 04:05:00	ah it does say, nvm.
2022-10-31 07:41:25	The Tendency function in the section on squeeze is incorrect
2022-10-31 07:42:03	It says this: ```c tendency(A, B, C) { X = (4 * A − 3 * C − B + 6) Idiv 12; if (A >= B and B >= C) { if (X − (X & 1) > 2 * (A − B)) X = 2 * (A − B) + 1; if (X + (X & 1) > 2 * (B − C)) X = 2 * (B − C); return X; } else if (A <= B and B <= C) { if (X + (X & 1) < 2 * (A − B)) X = 2 * (A − B) − 1; if (X − (X & 1) < 2 * (B − C)) X = 2 * (B − C); return X; } else return 0; } ```
2022-10-31 07:42:53	This is in libjxl/squeeze.h:
2022-10-31 07:43:00	```c++ inline pixel_type_w SmoothTendency(pixel_type_w B, pixel_type_w a, pixel_type_w n) { pixel_type_w diff = 0; if (B >= a && a >= n) { diff = (4 * B - 3 * n - a + 6) / 12; // 2C = a<<1 + diff - diff&1 <= 2B so diff - diff&1 <= 2B - 2a // 2D = a<<1 - diff - diff&1 >= 2n so diff + diff&1 <= 2a - 2n if (diff - (diff & 1) > 2 * (B - a)) diff = 2 * (B - a) + 1; if (diff + (diff & 1) > 2 * (a - n)) diff = 2 * (a - n); } else if (B <= a && a <= n) { diff = (4 * B - 3 * n - a - 6) / 12; // 2C = a<<1 + diff + diff&1 >= 2B so diff + diff&1 >= 2B - 2a // 2D = a<<1 - diff + diff&1 <= 2n so diff - diff&1 >= 2a - 2n if (diff + (diff & 1) < 2 * (B - a)) diff = 2 * (B - a) - 1; if (diff - (diff & 1) < 2 * (a - n)) diff = 2 * (a - n); } return diff; } ```
2022-10-31 07:43:09	Notice that `diff` is calculated differently based on the branches

_wb_

2022-10-31 09:22:38

🤦‍♂️

Moritz Firsching

2022-11-01 01:48:04

good catch, made a pr to fix it in the spec (I think the code is correct...)

_wb_

2022-11-01 04:36:05

does it actually make a difference? I wonder why the spec didn't just copy the code

veluca

2022-11-01 05:32:17

yeah it changes rounding

Traneptora

2022-11-01 06:00:35	it makes it so diff is at most off-by-one here
2022-11-01 06:00:51	+6 vs -6 is always is always different by exactly 12, so when dividing by 12 it's different by exactly 1
2022-11-01 06:01:14	which affects the parity, i.e. `(diff & 1)`
2022-11-01 06:01:59	diff ends up 1 smaller than it was
2022-11-01 06:02:19	so if diff was even, then `diff + (diff & 1)` will end up being the same
2022-11-01 06:02:31	but if diff was odd, then `diff + (diff & 1)` is two lower than it was
2022-11-01 06:02:36	which potentially affects the thresholds
2022-11-01 06:03:52	it also slightly affects the output of the squeeze transform because `buffer[x][2 * y + 1] = first - diff`
2022-11-01 06:03:56	whose parity matters
2022-11-01 06:04:13	it's probably not exceptionally noticeable but it does affect it slightly
	Moritz Firsching good catch, made a pr to fix it in the spec (I think the code is correct...)
2022-11-01 06:04:29	whether or not it was correct, it is now, I suppose <:kek:857018203640561677>
2022-11-01 06:04:36	because libjxl in some ways defines the bitstream

veluca

2022-11-01 06:42:23	the C++ version makes sense in that it's [positive]+6 or [negative]-6, i.e. adds "away from 0"
2022-11-01 06:42:52	rn I can't figure out what that means for the division

Traneptora

2022-11-01 06:44:15	integer division rounds down, toward negative infinity
2022-11-01 06:45:21	er wait
2022-11-01 06:45:21	interesting
2022-11-01 06:45:35	it's toward 0 in C, but it's toward -infinity in python
2022-11-01 06:45:36	how about that
2022-11-01 06:46:01	``` $ cat test.c && gcc test.c && ./a.out #include <stdio.h> int main(int argc, char *argv[]) { printf("%d\n", -13 / 12); return 0; } -1 ```
2022-11-01 06:46:46	It's right shift that rounds toward minus infinity
2022-11-01 06:46:53	```c #include <stdio.h> int main(int argc, char *argv[]) { printf("%d, %d\n", -13 / 8, -13 >> 3); return 0; } ```
2022-11-01 06:47:05	this outputs `-1, -2`
2022-11-01 06:47:18	well strictly speaking, right shifting a signed negative integer is undefined behavior in C
2022-11-01 06:47:35	on x86 it sign-extends
2022-11-01 06:48:08	but the spec should probably say that when right shifting a negative-integer, it is assumed that the extra bits are populated by the sign bit
2022-11-01 06:48:25	it might say that but if it doesn't it should, as it's Undefined Behavior in C
2022-11-01 06:49:07	I ran into this issue with the weighted predictor because there's a number of right shifts
2022-11-01 06:49:21	which cannot be replaced with integer divisions
2022-11-01 06:49:57	luckily the JVM has two operators, `>>` and `>>>` for signed ints, which are fully specified: `>>` sign-extends and `>>>` zero-extends

Moritz Firsching

	Traneptora whether or not it was correct, it is now, I suppose <:kek:857018203640561677>
2022-11-01 07:29:49	that's the spirit...

Traneptora

2022-11-01 08:14:09	So at this point I'm confused. I have a modular stream that's being decoded identically to libjxl
2022-11-01 08:14:35	gaborish is disabled in the frame header, as is epf
2022-11-01 08:15:17	many of the values in this modular stream are out of range. the BitDepth header in the image header field tells me that `exp_bits = 0` and `bits = 8`
2022-11-01 08:16:39	There is exactly one frame in the image
2022-11-01 08:17:02	but simply clamping these values between 0 and 255 yields the wrong image
2022-11-01 08:17:25	what is libjxl doing that lets it take this modular stream and reconstruct it properly?
2022-11-01 08:17:42	`factor` in libjxl is `1f / 255f`
2022-11-01 08:18:02	And whatever it is doing, why is this not described in the spec?
2022-11-01 08:19:00	patches, splines, and noise are all disabled.
2022-11-01 08:19:32	XYB is false, as is YCbCr.
2022-11-01 08:20:54	oh fuck it IS xyb encoed. holy crap, ignore me, lmao.
2022-11-01 09:25:47	Okay, here's something. The default values of `m_x_lf_unscaled` are 4096, in table G.2, but I'm pretty sure it should be `1/4096`
2022-11-01 09:26:15	since later in Table L.2.2 it says `X = X’ * m_x_lf_unscaled`
2022-11-01 09:26:28	and I believe X is supposed to be divided by 4096 here, not multiplied
2022-11-01 09:54:54	OH!
2022-11-01 09:55:12	^ not only is this true, but the spec fails to mention that they're supposed to be divided by 128.0f
2022-11-01 09:55:17	if not all default
2022-11-01 09:56:02	```c++ Status DequantMatrices::DecodeDC(BitReader* br) { bool all_default = br->ReadBits(1); if (!br->AllReadsWithinBounds()) return JXL_FAILURE("EOS during DecodeDC"); if (!all_default) { for (size_t c = 0; c < 3; c++) { JXL_RETURN_IF_ERROR(F16Coder::Read(br, &dc_quant_[c])); dc_quant_[c] *= 1.0f / 128.0f; // Negative values and nearly zero are invalid values. if (dc_quant_[c] < kAlmostZero) { return JXL_FAILURE("Invalid dc_quant: coefficient is too small."); } inv_dc_quant_[c] = 1.0f / dc_quant_[c]; } } return true; }```
2022-11-01 09:56:14	specifically the 1.0f / 128f

_wb_

2022-11-01 10:04:03

oh damn, so many places where these quantizers get rescaled or inverted, so easy to get lost in that

Traneptora

2022-11-01 10:04:39	Yea, they're encoded in the bitstream as 1f / foo
2022-11-01 10:04:46	The actual behavior is as follow:
2022-11-01 10:05:00	`m_lf_x_unscaled` defaults to `1 / 4096`
2022-11-01 10:05:10	instead, if it's in the bitstream, you just let `m_lf_x_unscaled = F16() / 128f`
2022-11-01 10:05:19	It's already in the bistream in the denominator mode
2022-11-01 10:06:08	This means that if it's encoded in the bistream as `1/256f` then it'll end up as `1/32768f` when multiplied by `X'`
2022-11-01 10:06:18	Does that make sense?
2022-11-01 10:06:41	don't divide the default values by 1/128, those are baked in.

yurume

2022-11-02 04:05:51

I thought I reported these issues about quantizers, weren't them incorporated into the current draft?

Traneptora

2022-11-02 04:43:08

must have been missed

yurume

2022-11-02 04:46:44	or I forgot to report them... let me check
2022-11-02 04:47:24	oh, my bad, that was indeed the case
2022-11-02 04:47:47	<@853026420792360980> maybe you want to keep eyes on "TODO spec" comments in J40

Traneptora

2022-11-02 04:48:01	I'm just reporting them as I go
2022-11-02 04:48:05	this channel didn't exist at the time you made tho spec comments
2022-11-02 04:48:16	btw, you have a bug in j40

yurume

2022-11-02 04:48:35

I think I did mention them back when I was talking in <#794206087879852106> , but I must have forgotten to report them formally

Traneptora

2022-11-02 04:48:38	in predictor 8 you have `p.w - (p.ww + p.nw - p.ne)` or something
2022-11-02 04:49:05	which is not the same thing as `p.w + p.n - p.nee` or whatever it is, evaluated one pixel over
2022-11-02 04:49:50	most of the time it's the same, but `if x = 1`, then `p.ww` evaluates to `p.w`, but `p.w` one pixel to the left evalutes to `p.nw`
2022-11-02 04:50:24	so in j40 you need an `if x == 1` in predictor 8

yurume

2022-11-02 04:53:14	so is J40 diverging from libjxl in that case?
2022-11-02 04:53:58	I think I have taken extra care to handle corner cases (there are three sets of them, I recall) but I may have missed one

Traneptora

2022-11-02 05:28:03

in that specific case, yea

_wb_

2022-11-02 06:31:48

Edge cases are so annoying

yurume

2022-11-02 08:12:16	I'm gonna take some time to document all remaining spec bugs that are only in the J40 comments today
2022-11-02 08:12:52	cause sooner or later, someone (in this case thebombzen) will be hit by those bugs anyway
2022-11-02 08:13:27	and available bug reports are better than thorough bug reports

Traneptora

2022-11-02 08:28:46	I'm currently confused about how you're supposed to crop the LF groups if the image isn't a perfect power of 2
2022-11-02 08:29:20	in the case of Pass Groups since they (mostly) correspond 1-1 to pixels, you just crop the group by the amount it overflows
2022-11-02 08:30:53	I have a 3264x2448 image, which has four LF groups
2022-11-02 08:31:03	3264/8 is 408
2022-11-02 08:31:17	and 2448/8 is 306
2022-11-02 08:31:52	so I'd assume that LF Groups would be cropped if they overflowed, i.e. 408 - 256 = 152
2022-11-02 08:31:59	but somehow that isn't working
2022-11-02 08:32:28	and I'm not entirely sure why

yurume

2022-11-02 08:33:37

I'm not sure if this is documented or not, but as I understand, you first round dimension to next multiples of 8; this means that there are some blocks where some edges are left unused.

Traneptora

2022-11-02 08:34:09

everything here is a multiple of 8 though, 408, 256, and 152

yurume

2022-11-02 08:34:27	for groups and LF groups, assuming you've rounded dimensions accordingly, they have no further restrictions; any integral number in width and height (in terms of blocks) should work
2022-11-02 08:34:49	ah wait
2022-11-02 08:34:59	~~sorry, my comment should only apply to groups, not LF groups~~ I rechecked my code and I think this applies both to groups and LF groups

Traneptora

2022-11-02 08:35:52	Yea, I figured out Pass Groups
2022-11-02 08:36:35	``` Index Width Height x0 y0 0, 128, 256, 0, 0 0, 64, 128, 0, 0 0, 64, 128, 0, 0 0, 256, 256, 0, 0 0, 128, 128, 0, 0 0, 128, 128, 0, 0 0, 128, 256, 0, 0 0, 128, 256, 0, 0 0, 256, 256, 0, 0 0, 256, 256, 0, 0 1, 128, 256, 128, 0 1, 64, 128, 64, 0 1, 64, 128, 64, 0 1, 152, 256, 256, 0 1, 128, 128, 128, 0 1, 128, 128, 128, 0 1, 128, 256, 128, 0 1, 128, 256, 128, 0 1, 152, 256, 256, 0 1, 152, 256, 256, 0 ```
2022-11-02 08:36:53	I get through the channel list like so, and then I hit a TOC boundary
2022-11-02 08:37:19	I've discovered through debugging that overrunning a TOC boundary typically means that you are decoding the wrong thing

yurume

2022-11-02 08:37:30	so in your case (assuming group size of 256 and LF group size of 2048), you have four LF groups of dimensions 2048x2048, 1216x2048, 2048x400, 1216x400 respectively
2022-11-02 08:38:04	which are 256x256, 152x256, 256x50, 152x50 in terms of block counts

Traneptora

2022-11-02 08:38:22	that's what it appears to be decoding
2022-11-02 08:38:34	but then it barfs. it's possible that there's a bug somewhere else though that's causing it to barf

yurume

2022-11-02 08:39:14

there would be 130 groups, where first 8 groups are the topmost groups in the first LF group, next 5 groups are for the topmost groups in the second LF group, next 8 groups are for the first LF group, next 5 are for the second LF group and so on

Traneptora

2022-11-02 08:39:28	yea, and the passgroups are decoding fine
2022-11-02 08:39:31	all 130 of them
2022-11-02 08:39:38	the issue is the LF groups

yurume

2022-11-02 08:39:50

can you pinpoint where you are stuck?

Traneptora

2022-11-02 08:40:04

I'm trying to figure out if I'm calculating the LF dimensions incorrectly

yurume

2022-11-02 08:40:23

cause I thought if you've done groups LF groups should follow naturally...

Traneptora

2022-11-02 08:40:23

I may just have a bug elsewhere for all I know though.

yurume

2022-11-02 08:41:23	I believe your code does recognize there are 4 LF groups, otherwise TOC wouldn't work, right?
2022-11-02 08:41:50	then I believe you are going to decode the LfGroup bundle

Traneptora

2022-11-02 08:42:06	absolutely
2022-11-02 08:42:34	OH
2022-11-02 08:42:42	stream index was wrong

yurume

2022-11-02 08:42:43	which consists of ModularLfGroup (for LQIP I believe), LF coefficients and HF metadata where the last two are for VarDCT; which one are you stuck at?
2022-11-02 08:42:46	OH

Traneptora

2022-11-02 08:42:46	causing the MA tree to collapse
2022-11-02 08:42:48	it was a bug elsewhere

yurume

2022-11-02 08:43:14

that's pretty understandable

Traneptora

2022-11-02 08:43:29	``` for ModularLfGroup: 1 + num_lf_groups + LF group index; ``` <@794205442175402004> This is incorrect, it's supposed to be `1 + LF Group index`
2022-11-02 08:43:39	you don't add the number of LF groups to the ModularLfGroup stream index
2022-11-02 08:43:51	I'm not sure what the LF Coefficients and HF Metadata are supposed to be though

yurume

2022-11-02 08:44:18

hmm?

Traneptora

2022-11-02 08:44:32

but the stream index was incorrect

yurume

2022-11-02 08:44:34

I'm checking that part of spec, it used to be pretty much wrong

Traneptora

2022-11-02 08:44:36

as soon as I changed that it decoded fine

yurume

2022-11-02 08:44:39	and I think I've checked that
	Traneptora as soon as I changed that it decoded fine
2022-11-02 08:45:12	do your image have ModularLfGroup?

Traneptora

2022-11-02 08:45:22	Yes, this is modular.
2022-11-02 08:45:28	I haven't started working on VarDCT yet.

yurume

2022-11-02 08:45:35	I mean, ModularLfGroup is normally empty if you don't have squeeze
2022-11-02 08:45:41	so I guess that's why I missed that part

Traneptora

2022-11-02 08:45:43	this is lossy modular, as a test.
2022-11-02 08:45:49	I have squeeze

yurume

2022-11-02 08:46:16	yeah, ModularLfGroup only appears when you have hshift or vshift of at least 3
2022-11-02 08:46:20	which doesn't happen if you don't have squeeze
2022-11-02 08:47:20	so the question is: normally we are supposed to have different (fragments of) MA tree for LF coefficients, ModularLfGroup and HF metadata
2022-11-02 08:47:37	because of course these three may have different optimal MA trees
2022-11-02 08:48:07	but as it stands, there is no way in an MA tree to distinguish LF coefficients from ModularLfGroup
2022-11-02 08:48:55	so is there any impact of this bug?

Traneptora

	yurume but as it stands, there is no way in an MA tree to distinguish LF coefficients from ModularLfGroup
2022-11-02 08:49:33	stream index, property 1

yurume

2022-11-02 08:50:07

but stream indices for LF coefficients *are* already `1 + LF group index`, which I believe is correct because otherwise J40 wouldn't work

Traneptora

2022-11-02 08:50:29

... huh.

yurume

2022-11-02 08:50:50	yeah, I've checked J40 and it is indeed `1 + LF group index` (or `sidx0`, as J40 says)
2022-11-02 08:51:12	`sidx1` (`1 + num_lf_groups + LF group index`) was reserved for ModularLfGroup which is not yet implemented
2022-11-02 08:51:48	I'll check libjxl to be sure

Traneptora

2022-11-02 08:51:52	wait, hm.
2022-11-02 08:51:57	Looks like illegal final ans state

yurume

2022-11-02 08:52:05

maybe my test images just didn't have stream index nodes

Traneptora

2022-11-02 08:52:12	so maybe I'm just decoding garbage
2022-11-02 08:53:04	ah, I was just decoding garbage
2022-11-02 08:53:05	that's mb
2022-11-02 08:53:20	I set it back to what it had been, and the first LF Group verified successfully
2022-11-02 08:53:26	ignore me, bug not happened.

yurume

2022-11-02 08:54:08	ah, good
2022-11-02 08:54:14	so that part of spec seems okay

Traneptora

2022-11-02 08:54:37	I'm still confused about the LF group sizes though
2022-11-02 08:54:54	where am I supposed to truncate the LF Groups when they overflow out of the side of the image?

_wb_

2022-11-02 08:55:35

I really appreciate it that we now have two independent implementers sorting out potential spec issues amongst themselves, by the way. This will lead to a 2nd edition of the spec that will be so much better than the 1st edition. Thanks so much!

Traneptora

2022-11-02 08:55:51	I have a 408/306 image after dividing by 8, and the LF groups then should be variations of 256x256
2022-11-02 08:56:03	so I'm producing a 256x256 upper left corner
2022-11-02 08:56:10	152x256 upper right
2022-11-02 08:56:27	but upon decoding that second one I overrun the TOC buffer, which usually means I'm decoding the wrong thing

_wb_

2022-11-02 08:56:35

Yes, should be same splitting logic as the hf groups

Traneptora

2022-11-02 08:56:55	so in this specific case, 152x50 for the lower right is correct, right?
2022-11-02 08:57:20	if so, I must have a bug elsewhere

_wb_

2022-11-02 08:57:52

Overrunning toc is indeed a sign of a bug, no encoder produces overlapping sections and we should probably forbid it

Traneptora

2022-11-02 08:58:09

no I mean I hard-isolate the TOC into buffers because it makes that easier to detect

_wb_

2022-11-02 08:58:40

Same thing, no section should read more bytes than the length indicated in toc

Traneptora

2022-11-02 08:58:50	I thought that was pretty self-evident
2022-11-02 08:58:54	at least
2022-11-02 08:59:20	but either way, if those dimensions are correct then my bug is elsewhere

_wb_

2022-11-02 09:01:04

The dimensions are correct — you rounded up when dividing by 8, right?

Traneptora

2022-11-02 09:01:15

it's a multiple of 8, so no, but....

_wb_

2022-11-02 09:02:14

Must be bug elsewhere then

Traneptora

2022-11-02 09:02:33	gonna do my usual trick of dump output of libjxl and diff it to see where it differs
2022-11-02 09:33:02	and, I figured it out, I wasn't cropping the corresponding higher-shift channels either
2022-11-02 09:34:08	so the 256x256 channels needed to be cropped to 152x256, but the 128x256 channels needed to be cropped to 76x256
2022-11-02 10:19:57	ugh, how do you handle it if these are odd

Kleis Auke

	Traneptora gonna do my usual trick of dump output of libjxl and diff it to see where it differs
2022-11-03 01:44:43	Just wondering, do you use `jxlinfo` for this?
2022-11-03 01:53:06	(oh, I misread, it's probably `printf()`-debugging)

Traneptora

2022-11-03 02:27:19	yea, I printf to stdout and redirect
2022-11-03 02:30:01	basically I just straight up `printf("%zu, %zu, %d\n", x, y, p[x])` or something
2022-11-03 02:30:03	and compare the output

veluca

2022-11-03 02:34:50

most useful compression debugging technique *ever*

andrew

2022-11-04 01:18:20

I just started glancing at the spec - pseudocode operators are defined as being the same as those in C++, but signed integer overflow is undefined in C++. Is this intentional?

yurume

2022-11-04 01:43:33	all operators are defined as if there is an integral type of infinite size (there should be a mention), so it should never overflow
2022-11-04 01:43:56	in reality, of course, the implementer should be aware that the calculation can overflow

andrew

2022-11-04 01:52:31

yeah I don't see such a mention

Traneptora

	andrew yeah I don't see such a mention
2022-11-04 02:13:02	> Expressions and variables of which types are omitted, are understood as real numbers.
2022-11-04 02:13:04	section 4.4
2022-11-04 02:15:11	speaking of which
2022-11-04 02:15:14
2022-11-04 02:15:24	Section 4.3, missing a closing paren `)`
	andrew I just started glancing at the spec - pseudocode operators are defined as being the same as those in C++, but signed integer overflow is undefined in C++. Is this intentional?
2022-11-04 04:18:38	mostly* defined as being the same in C/C++ btw in C, right shifting a negative signed integer is undefined behavior
2022-11-04 04:19:08	but in the spec is clearly defined. in practice right shifting signed integers will sign-extend on x86
2022-11-04 04:19:16	which is also what happens in the spec
2022-11-05 04:00:36	Section K.3.1
2022-11-05 04:00:44	``` patch[i].x[j] = UnpackSigned(DecodeHybridVarLenUint(5)) + patch[i].x[j − 1]; patch[i].y[j] = UnpackSigned(DecodeHybridVarLenUint(5)) + patch[i].y[j − 1]; ```
2022-11-05 04:00:54	these should say `DecodeHybridVarLenUint(6)`
2022-11-05 04:00:56	the context is wrong
2022-11-05 04:01:21	note that in `patch_dictionary_internal.h`, it lists ` kPatchOffsetContext = 6,` on line 22
2022-11-05 04:01:52	and in `dec_patch_dictionary.cc` on line 110 we have this: ```c++ pos.x = positions_.back().x + UnpackSigned(read_num(kPatchOffsetContext)); pos.y = positions_.back().y + UnpackSigned(read_num(kPatchOffsetContext)); ```
2022-11-05 04:02:33	furthermore I tested this against `patch_lossless.jxl` conformance sample and the final ANS state fails (i.e. != 0x13000) if you use the 5 in the spec, but it succeeds if you use 6

andrew

2022-11-05 08:19:52	I'm trying to understand the examples in B.2.2 (U32)
2022-11-05 08:20:20	for the first example, you read two bits, 0, then 1, which is 0b10 = 2, which results in 32
2022-11-05 08:21:12	following the same bit ordering for the second example, you read two bits, 1, 1, which is 0b11 = 3, so you then read 8 more bits, 1, 0, 1, 0, 0, 0, 0, 0, which is 0b00000101 = 5 != 7
2022-11-05 08:21:54	or am I getting the bit ordering for the u(n) operation wrong
2022-11-05 09:07:32	okay, I think this is a typo in the spec, I just yanked out jxlatte's bit reader (because it was convenient) and the second example returned 5

Traneptora

	andrew following the same bit ordering for the second example, you read two bits, 1, 1, which is 0b11 = 3, so you then read 8 more bits, 1, 0, 1, 0, 0, 0, 0, 0, which is 0b00000101 = 5 != 7
2022-11-05 09:39:16	It's in little endian order
2022-11-05 09:39:36	010111 means first you read two bits-> 01, with 0111 remaining
2022-11-05 09:39:48	then you look at those two bits
2022-11-05 09:40:39	01 is in little-endian mode so that's actually 0b10
2022-11-05 09:41:07	which is 2
2022-11-05 09:42:04	hm, yea it's definitely a typo in the spec
2022-11-05 09:42:22	at that point it should pick u(6)
2022-11-05 09:42:29	but there's only 4 bits remaining
2022-11-05 09:42:50	it should say `101110`
2022-11-05 09:43:11	if it had said 101110, then you'd pick 0b10, with 0b1110 remaining
2022-11-05 09:43:39	then that would go to 1, b/c little endian, and then go to u(4), reading 0b1110, little endian, equals 7
2022-11-05 09:44:23	so it should say `U32(u(2), u(4), u(6), u(8)): 0b101110 -> value = 7`
2022-11-05 09:44:49	and it should say `U32(8, 16, 32, u(7)): 0b01 -> 32`
2022-11-05 09:44:55	someone messed up endianness
2022-11-05 09:45:17	good catch!

andrew

	Traneptora so it should say `U32(u(2), u(4), u(6), u(8)): 0b101110 -> value = 7`
2022-11-05 09:52:32	or should this be 0b011101?
2022-11-05 09:53:26	as in the first digit is most significant, as it is in most programming languages

Traneptora

2022-11-05 09:56:08	I'm using it in little-endian order. so if it occurs in the bitstream as 101110 then it would end up being 011101 if you wanted to populate it in a programming language, yes
2022-11-06 04:32:42	The spec doesn't specify what happens if you have a property whose value is out of range
2022-11-06 04:33:42	the correct behavior is that for properties >15, the property defaults to 0, unless computed as in the bottom of section H.4.1.
2022-11-06 04:33:58	The spec doesn't mention this, though.

_wb_

2022-11-06 04:52:31

ah, good point

yurume

2022-11-06 04:55:54

ugh, I thought it was simply an incorrect bitstream

_wb_

2022-11-06 04:58:50

no it's not, e.g. you could use properties that refer to the previous channel in a global tree that also gets used for the very first channel

Traneptora

2022-11-06 05:11:54	I ran into this issue when attempting to decode this JXL art piece: https://discord.com/channels/794206087879852103/824000991891554375/1038632475372560436
2022-11-06 05:12:01	channel 0 of the stream has property19

andrew

2022-11-06 09:27:15

In C.2.1, what exactly does `lz_cfg` refer to? is it a part of the entropy decoder state?

_wb_

2022-11-06 09:33:59	If lz77.enabled, the decoder sets lz_cfg = num_dist++, and reads lz_len_conf = HybridUintConfig(8)
2022-11-06 09:36:38	so basically one extra pre-clustered distribution is added for the lz77 distances

Traneptora

2022-11-06 09:37:54	the decoder reads a HybridUintConfig for lz77, which is separate from the HybridUintConfig added for the extra distribution
2022-11-06 09:38:49	However, `lz_len_conf` is the variable name used in section C.1. if It later calls it `lz_cfg`, that's a typo.

_wb_

2022-11-06 09:38:55

when reading a symbol, first a token is read which can be either a literal or represent an lz77 length, and if it's an lz77 length, then another token is read using a fixed (but signaled) hybriduintconfig to get the distance

Traneptora

2022-11-06 09:39:02

_wb_

2022-11-06 09:39:19

I think the spec is correct but the variable names and way it is described is not very clear

Traneptora

2022-11-06 09:39:34

`lz_cfg` is the index of the lz_len_conf, I see.

_wb_

2022-11-06 09:39:50	lz_len_conf is a hybriduintconfig used to interpret lengths
2022-11-06 09:40:36	lz_cfg is a context used to read distances

Traneptora

2022-11-06 09:40:56	It might make sense to call it `lz_distance_ctx`
2022-11-06 09:41:12	just then it's really clear that it's a context ID

_wb_

2022-11-06 09:41:17	lz_len_conf and lz_distance_ctx would be good names, I agree
2022-11-06 09:42:14	the confusing thing is the lengths don't need a context because the token is just a regular token that happens to be larger than some offset

Traneptora

2022-11-06 09:42:37	oh while we're on the topic of section C.3.3.: ```c++ else if (distance < 120) { distance = kSpecialDistances[distance][0]; distance += dist_multiplier * kSpecialDistances[distance][1]; ``` this doesn't work you probably mean something like ```c++ else if (distance < 120) { offset = kSpecialDistances[distance][0]; distance = offset + dist_multiplier * kSpecialDistances[distance][1]; ```
2022-11-06 09:42:53	in particular, consider what happens when the offset is negative
2022-11-06 09:42:56	the first one crashes the code
2022-11-06 09:43:07	libjxl handles it correctly by doing it in one expression, but the spec is incorrect

_wb_

2022-11-06 09:43:45	right, d'oh
2022-11-06 09:44:06	so easy to write silly bugs in code that doesn't get executed

Traneptora

2022-11-06 09:44:26

Yea, that's the main difference between proofreading code and trying to implement it

_wb_

2022-11-06 09:45:01	and at ISO they only did the former, if even that
2022-11-06 09:45:59	many of the comments we got from national bodies were more about the dots and commas in boilerplate text than about the actual technical meat

Traneptora

2022-11-06 09:46:54	one thing I find convenient about doing this in java is attempting to index an array with a negative number immediately throws an exception, which crashes the process. the JVM enforces those memory barriers. it means those sorts of off-by-one errors fail-fast
2022-11-06 09:47:22	unlike in C where you need valgrind or a similar tool to tell you, unless x[-1] happens to hit a dead page

andrew

	_wb_ lz_cfg is a context used to read distances
2022-11-06 09:47:25	so `lz_cfg` (which should probably be named `lz_distance_ctx`) is a field in the entropy decoder state that refers to some index in `clusters`?

Traneptora

2022-11-06 09:48:55	when Lz77 is enabled, it reads a token. if the token is too big, it goes "oh, this must be an Lz77 length"
2022-11-06 09:49:03	but Lz77 uses length/distance pairs
2022-11-06 09:49:17	so it then reads the distance, using `lz_distance_ctx` as the context.
	andrew so `lz_cfg` (which should probably be named `lz_distance_ctx`) is a field in the entropy decoder state that refers to some index in `clusters`?
2022-11-06 09:50:36	also keep in mind that `clusters` is a map. it maps contexts to clusters. since in practice some of the contexts will have very similar frequencies, clustering lets the stream consider two contexts to be the "same"
2022-11-06 09:50:48	so it maps a context ID to a cluster ID
2022-11-06 09:51:04	which can and frequently will have a lot of overlap
2022-11-06 09:52:07	it's easy to treat it as an array, but conceptually it's a function `int mapContextIDToClusterID(int contextID) // returns cluster ID`

_wb_

2022-11-06 09:54:55	context clustering was invented in brunsli/pik (maybe already in brotli / lossless webp?) as a way to have relatively large context models while keeping the number of histograms to signal manageable
2022-11-06 09:56:01	MA trees were invented in FLIF as a way to have even larger context models (ones where even the context map itself would be too large) while keeping the number of histograms (or in case of FLIF, CABAC chance tables) manageable
2022-11-06 09:56:48	and in JXL's modular mode you get both, which basically boils down to the MA tree effectively being a DAG instead of a tree
2022-11-06 09:57:15	(i.e. you can merge two leaf nodes however distant they are)

Traneptora

2022-11-06 09:57:22	in jxlatte I save some effort by flattening static properties but less so than in libjxl. unfortunately jxlatte is pretty slow
2022-11-06 09:58:55	the way I have it implemented in jxlatte is basically I have an MATree that contains a pointer to two child trees, also of type MATree. and I walk down the tree by doing basically `while (!tree.isLeaf()) tree = tree.decide();`

_wb_

2022-11-06 09:58:58	i recommend starting with slow and correct and worrying about speed later, because some things really get quite a lot more complicated and bug-prone if you want to do them fast too
2022-11-06 09:59:42	yeah the naive branchy MA tree evaluation is good enough, it works, it's just slow

Traneptora

2022-11-06 10:00:01	luckily this method makes walking down static properties pretty easy since I have a function that just walks down the tree until it hits a non-static property then aborts and returns the resulting tree
2022-11-06 10:00:28	so I can save a bunch of cycles that way
2022-11-06 10:00:49	I also compute properties on the fly, which can be slower if they need to be used again
2022-11-06 10:01:03	but can be faster if they don't need to be used

_wb_

2022-11-06 10:02:16

eventually we might want to do property computation and MA tree walking with some kind of dynamic code generation, but that's scary and tricky so for now we just stick to some specializations and some speedup tricks like turning binary trees into quadtrees

Traneptora

2022-11-06 10:03:17	but there's a few optimizations I could do, like I could make the row-arrays slightly bigger than they need to be, and populate them with the edge-case predictors that way there's no branching required when doing west() or north()
2022-11-06 10:04:04	since doing `y > 0 ? buffer[y - 1][x] : x > 0 ? buffer[y][x -1] : 0`
2022-11-06 10:04:05	ugh
2022-11-06 10:04:08	so much branching
2022-11-06 10:04:31	this can be optimized as a CMOV by a compiler but on the JVM I'm pretty sure it's a branch
2022-11-06 10:04:35	not entirely sure tho
2022-11-06 10:04:50	I'm targeted the JVM

_wb_

2022-11-06 10:04:53

yes, all of that helps, and you can do whatever is easy enough to do, but I think at this point having a full independent decoder that works though slowly is more valuable than having a partial one that is fast

Traneptora

2022-11-06 10:05:07	yea, this is all after I get full feature support
2022-11-06 10:05:09	it's lower priority
2022-11-06 10:05:26	the contest submission art you posted is going to be good for testing splines and noise tho, as well as frame blending
2022-11-06 10:05:35	so I'm happy about that

_wb_

2022-11-06 10:06:18

yeah it was an art in the spirit of what jxl_from_tree was originally made for: producing test bitstreams 🙂

Traneptora

2022-11-06 10:07:05

pretty on-point since a lot of my debug test samples came from <#824000991891554375>

_wb_

2022-11-06 10:18:43

I like it when fun and art can be at the same time a useful engineering tool used in something as 'boring' as spec verification

improver

2022-11-06 10:29:21

human-assisted fuzzing

andrew

2022-11-06 10:35:52	in C.2.1/C.2.2, where does `num_clusters` come from if `num_dist != 1`?
2022-11-06 10:36:45	https://discord.com/channels/794206087879852103/794206170445119489/1011659525385883788
2022-11-06 10:37:03	oh, it's supposed to be calculated by the max of clusters[i] + 1
2022-11-06 10:37:14	this should probably be specified

Traneptora

	andrew this should probably be specified
2022-11-06 11:21:47	It is in sectino C.2.2
2022-11-06 11:21:50	> The output of this procedure is an array clusters with num_dist entries (one for each pre-clustered context), with values in the range [0, num_clusters). All integers in [0, num_clusters) are present in this array, and num_clusters < 256. Position i in this array indicates that context i is merged into the corresponding cluster.
2022-11-06 11:22:24	since values arei in the range of `[0, num_clusters)` it implies that it occupies all values x for which 0 <= x < num_clusters
2022-11-06 11:23:17	which then tells you that num_clusters is the smallest integer greater than all of the xs
2022-11-06 11:23:32	It could be more explicit, but it it technically is specified
2022-11-07 12:24:50	if channel data ranges from `[0.0f, 1.0f]`
2022-11-07 12:24:56	``` R = Y + 0.5 + 1.402 * Cr; G = Y + 0.5 − 0.344136 * Cb − 0.714136 * Cr; B = Y + 0.5 + 1.772 * Cb; ```
2022-11-07 12:25:06	how does R or B ever end up as less than 0.5?
2022-11-07 12:25:22	if Cr = Cb = 0 you end up with Y + 0.5, not Y
2022-11-07 12:29:46	or is this one of those scenarios where Cb and Cr range from `[-0.5, 0.5]`?
2022-11-07 12:30:32	same with Y?
2022-11-07 12:30:47	because I'd assume that grayscale would be R=G=B=Y

_wb_

2022-11-07 01:47:29	Cr and Cb get negative, yes
2022-11-07 01:47:51	And Y too, yes
2022-11-07 01:48:17	Y=0 is gray
2022-11-07 01:48:59	I think

Traneptora

2022-11-07 02:43:01	Y=0 being 50% gray means these all range from [-0.5, 0.5]
2022-11-07 02:43:09	which is fine, I just wanted to double check
2022-11-07 03:29:20	Section K.4.1 says this:
2022-11-07 03:29:39	> Then, UnpackSigned(DecodeHybridVarLenUint(5)) is used four times thirty-two times to obtain, in this order, sequences of thirty-two integers representing the following coefficients of the spline along its arc length: > q_dctX[32], the decorrelated and quantized DCT32 coefficients of the X channel of the spline > > q_dctY[32], the quantized DCT32 coefficients of the Y channel of the spline > > q_dctB[32], the decorrelated and quantized DCT32 coefficients of the B channel of the spline > > q_dctSigma[32], the quantized DCT32 coefficients of the σ > parameter of the spline (defining its thickness)
2022-11-07 03:30:26	are all 32 of the `q_dctX[32]` read first, and then all 32 of the `q_dctY[32]`, and then all 32... etc.
2022-11-07 03:30:33	or are they interleaved?
2022-11-07 03:30:38	It's not clear to me from how this is phrased.
2022-11-07 03:34:29	Another question
2022-11-07 03:34:31	> These coefficients are dequantized and recorrelated (see I.2.3) as follows:
2022-11-07 03:34:58	Splines can be used in Modular Mode (e.g. lossy XYB), but the LFChannelCorrelation bundle (defined in I.2.3) is only present in LFGlobal in VarDCT mode.
2022-11-07 03:35:05	Do you use the default value in this case?

_wb_

2022-11-07 03:38:37

yes

veluca

2022-11-08 01:05:29

pretty sure cb and cr are in +-.5 range, and y is in 0-1 range...

Traneptora

2022-11-08 01:08:15

if Y is `[0, 1]` then the inverse YCbCr transform appears to be incorrect

veluca

2022-11-08 01:34:07	let me check in libjxl
2022-11-08 01:35:50	ok, no, y is also +-.5
2022-11-08 01:36:09	I am not quite sure why we did that as XYB's Y isn't, but oh well

_wb_

2022-11-08 05:32:17	Because that is how jpeg does it
2022-11-08 05:32:49	I think

Traneptora

2022-11-08 07:23:29	Section K.4.1 provides pseudocode but it decodes quant_adjust in the wrong order
2022-11-08 07:23:50	this is in section K.4.1: ```c num_splines = DecodeHybridVarLenUint(2) + 1; quant_adjust = UnpackSigned(DecodeHybridVarLenUint(0)); last_x = 0; last_y = 0; for (i = 0; i < num_splines; i++) { // etc. ```
2022-11-08 07:24:26	splines.cc line 549:
2022-11-08 07:24:31	```cpp JXL_RETURN_IF_ERROR(DecodeAllStartingPoints(&starting_points_, br, &decoder, context_map, num_splines)); quantization_adjustment_ = UnpackSigned( decoder.ReadHybridUint(kQuantizationAdjustmentContext, br, context_map)); ```
2022-11-08 07:24:43	notice that `quantization_adjustment_` is decoded after reading the starting points, not before, unlike what it says in the spec.

_wb_

2022-11-08 08:57:58	> Suggestion for a spec extension / bit of a long shot: Allow horizontal and vertical flips of patches. The point is that you've already done the work by defining mirrored coordinates - so keep the source coordinates of the patch normal but allow the destination coordinates to be mirrored. It's not really a bitstream change... (suggestion from <@967519502977888367> )
2022-11-08 08:59:45	We did consider adding bits for flips and it could be useful for symmetric things like many letter shapes or circles/ellipses, but we ended up not doing it.
2022-11-08 09:02:15	Doing it via negative / out of bounds offsets and using mirroring semantics could work with just a small spec change, but it would make it impossible to use patches that cross the image edge, like e.g. a sprite that is partially off-screen (since the off-screen part would end up getting mirrored and messing up the on-screen part)
2022-11-08 09:03:55	(though such patches could still be done by doing them separately with smaller source rectangles, which would be a small extra signaling cost but not hugely)
2022-11-08 09:06:48	The implementation of patch blitting would be a bit trickier though if it needs to do mirroring — naive per-pixel coordinate mirroring would be easy to implement but to do it efficiently there would be one code path for each of the 4 possible orientations, and the case of image edge crossing would need to be handled too (either by not allowing it or by splitting the blitting in 2 or 4 operations)

dds

2022-11-08 09:35:45	I'm glad it was considered 🙂 Though pixelwise patch blitting is surely pretty quick compared to other steps of the image decode..?
2022-11-08 09:36:08	I wonder how practical 'fake' mirroring is with the current spec. Say you want to do a vertical mirror. Then take one n x 1 patch per row and assemble them in reverse order. Finally hope that the large amount of patch metadata compresses well.
	_wb_ Doing it via negative / out of bounds offsets and using mirroring semantics could work with just a small spec change, but it would make it impossible to use patches that cross the image edge, like e.g. a sprite that is partially off-screen (since the off-screen part would end up getting mirrored and messing up the on-screen part)
2022-11-08 09:42:42	Not that it really matters, I think I stated it the wrong way round from the way I first thought of it: instead allow the source patch to be mirrored but insist that the destination is completely in-bounds. Then crossing the image edge isn't an issue - you get a partially mirrored patch rather than corruption.

Traneptora

2022-11-08 09:44:56	as far as I'm aware, the design of extensions though is based on PNG, in that a decoder is required to ignore extensions it doesn't understand
2022-11-08 09:45:36	which makes implementing extensions that change rendering behavior a bit iffy

_wb_

2022-11-08 09:45:55	Right, and then if the source region overlaps the edge, it could actually make sense, e.g. you could put the left half of an A at the right end of the patch frame, take a source that goes over the right edge, and that would blit as a full A
	Traneptora which makes implementing extensions that change rendering behavior a bit iffy
2022-11-08 09:48:27	yeah this would not be done as an extension but by defining behavior for bitstreams where the current behavior is not really clear: what if the source rect for a patch is outside the source frame?
2022-11-08 09:49:02	currently the spec says "All referenced samples are within the bounds of the reference frame." so this is an invalid bitstream atm

Traneptora

2022-11-08 09:49:30

oh yea, if that happens I return an error and refuse to decode

_wb_

2022-11-08 09:49:50

we could make it a valid bitstream; existing decoders will not decode it of course or at least not correctly, so it's probably a bad idea

Traneptora

2022-11-08 09:50:17	what I discovered is that for building a decoder, it's helpful to aggressively throw errors at non-compliance
2022-11-08 09:50:25	because a lot of those errors end up being caused by bugs
2022-11-08 09:50:29	rather than by actual noncompliance and they get caught

lonjil

2022-11-08 09:50:34

I'll implement this, creating the first fragmentation of the JXL ecosystem 😏

_wb_

2022-11-08 09:50:42	lol
2022-11-08 09:50:59	well decoders are free to do what they want on non-compliant bitstreams
2022-11-08 09:51:26	the real problem is when encoders start to rely on what decoders do with them

Traneptora

2022-11-08 09:51:31

but I mean, throwing errors rather than making best-effort decodes is easier for building a decoder

Info

JPEG XL

General chat

Voice Channels

Archived

Specification issues