|
yurume
|
2022-09-18 10:42:26
|
JPEG XL spec bugs, issues and questions.
|
|
2022-09-18 10:44:06
|
I'd like to post any recently discovered spec bug to here, since it is not really about J40
|
|
2022-09-18 10:44:56
|
negative special distance during LZ77:
```c
int32_t special = (int32_t) SPECIAL_DISTANCES[distance];
J40__ASSERT(dist_mult <= J40__MAX_DIST_MULT);
// TODO spec bug: distance can be as low as -6 when dist_mult = 1 and distance =
// dist_mult * 1 - 7; libjxl clamps it to the minimum of 1, so we do the same here
distance = j40__max32(1, ((special >> 4) - 7) + dist_mult * (special & 7));
```
|
|
2022-09-18 10:45:54
|
a related edge case where distance is zero at the very first symbol:
```c
distance = j40__min32(j40__min32(distance, code->num_decoded), 1 << 20);
code->copy_pos = code->num_decoded - distance;
if (J40_UNLIKELY(distance == 0)) {
// TODO spec bug: this is possible when num_decoded == 0 (or a non-positive special
// distance, handled above) and libjxl acts as if `window[i]` is initially filled with 0
J40__ASSERT(code->num_decoded == 0 && !code->window);
code->window = j40__calloc(1 << 20u, sizeof(int32_t));
if (!code->window) return J40__ERR("!mem"), 0;
}
```
|
|
2022-09-19 12:37:09
|
I pondered about these two
|
|
2022-09-19 12:38:12
|
`width * height` here refers to "the largest dimensions (horizontally and vertically, respectively) of the image and any of its frames, in pixels", but splines and patches are per-frame, so this means that checking for those limits can only be done when every frame is read (!)
|
|
2022-09-19 12:40:21
|
for example in 100x100 image the first frame can have 100x100 dimensions and 2501 splines. this can be valid if the second frame has, say, 200x200 dimensions which later get cropped.
|
|
2022-09-19 12:41:32
|
similarly, ICC `output_size` is distinct from `enc_size`, and it is really inconvenient (if not impossible) to estimate the minimum possible `output_size` from `enc_size` AFAIK
|
|
2022-09-19 12:42:24
|
libjxl at least assumes that each command emits at least one byte, I'm yet to validate or invalidate this assumption
|
|
2022-09-19 03:14:04
|
I've analyzed the specification and yes, this assumption is invalid
|
|
2022-09-19 03:15:11
|
specifically main command byte 1 through 4 reads the length of output bytes `num` from the command stream which is allowed to be zero, so it is possible to artifically inflate `enc_size` without altering `output_size`
|
|
2022-09-19 04:27:05
|
fortunately libjxl doesn't seem to emit any zero-sized command, so I think we can simply forbid zero-sized command, and note that every command generates at least one output byte, which should be enough hint for implementers
|
|
2022-09-20 02:49:13
|
this possibility of ICC codec DoS has been reported to the issue tracker maintained by <@794205442175402004>.
|
|
|
_wb_
|
2022-09-20 05:15:23
|
For the splines and patches limits, we probably need to make it refer to the frame they're in
|
|
2022-09-20 05:16:37
|
Libjxl also has a limit on the total pixel area of the splines
|
|
2022-09-20 05:17:34
|
we may want to add such a limit to the profile too
|
|
|
yurume
|
|
_wb_
For the splines and patches limits, we probably need to make it refer to the frame they're in
|
|
2022-09-20 05:20:58
|
yeah, that makes an intuitive sense, I can see some cases where a smaller frame wants to complement prior frames and thus can have more splines than the individual frame needs, but taking account for image dimensions seems enough for that.
|
|
|
Traneptora
|
2022-09-25 04:16:02
|
1. when calculating `num_groups`, one must divide `ceil(width / group_dim)`, using `width` in the frame header. However, if `lf_level != 0`, then one must set `width = ceil(width / (1 << (3 * lf_level))` *before* making this `num_groups` calculation. (This is the same for height.) This is in the spec, but since it's mentioned after `num_groups` is defined, it's very confusing.
|
|
2022-09-25 04:16:50
|
2. The same is also true if `upsampling != 1`. If `upsampling > 1` then you must divide `width = ceil(width / upsampling)` before calculating `num_groups`. Do note that upsampling is present provided that `!(flags & kUseLfFrame)`, but the spec doesn't actually say that this flag *must* be enabled if `frame_type == kLFFrame`. If so, it should say so. Since you call `ceil` first, this is not commutative either.
|
|
2022-09-25 04:18:10
|
3. The condition for `save_before_ct` is incorrect. The spec says that it's signalled provided that `!all_default && frame_type != kLFFrame`, with a default value `d_sbct` defined later in the spec. The actual condition for `save_before_ct` being signaled is
```
!all_default && (frame_type == kFrameReferenceOnly || full_frame && (frame_type == kFrameRegular || frame_type == kFrameSkipProgressive) && (frame_duration == 0 || save_as_ref != 0) && !is_last && blend_mode == kReplace)
```
based on libjxl code.
|
|
2022-09-25 04:18:48
|
4. The spec should be clear that if `num_toc_entires > 1`, then all the TOC entries listed there are present but the size may be zero for some of them, e.g. in the case of `encoding == kModular`. This is technically stated but it's a bit confusing IMO.
|
|
2022-09-25 04:20:15
|
5. The spec says this (F.3.3)
> Before decoding the TOC, the decoder invokes `ZeroPadToByte()` (B.2.7).
It's not entirely clear what "before decoding the TOC means" here. The actual behavior is that `permuted_toc = Bool()` is read *immediately* after the frame header, and then if it is nonzero, the TOC permutation is read *immediately* after `permuted_toc`, *then* `ZeroPadToByte()` is invoked, then the TOC lengths are each read, and *then* `ZeroPadToByte()` is invoked again after they have all been read. This should be more explicit.
|
|
2022-09-25 04:20:39
|
6. `[0, num_clusters - 1)` appears twice in section C.2.1. It should be `[0, num_clusters)`
|
|
2022-09-25 04:21:36
|
7. Alias Mapping section C2.2, provides this line of code:
```cpp
if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i);
```
this is incorrect. The correct line here is:
```cpp
if (cutoffs[i] > bucket_size) overfull.push_back(i);
else if (cutoffs[i] < bucket_size) underfull.push_back(i);
```
|
|
2022-09-25 04:22:19
|
8. Alias Mapping section C.2.2 is missing an extra loop:
```cpp
for (i = alphabet_size; i < table_size; i++) {
underfull.push_back(i);
}
while(overfull.size() > 0) { // right before this line
```
This same loop should also zero out `cutoffs[i]` if we're not assuming it's initially zero.
|
|
2022-09-25 04:23:15
|
9. Alias Mapping is missing a comment that says underfull never runs out:
```cpp
while (overfull.size() > 0) {
/* underfull.size() > 0 */
o = overfull.back(); u = underfull.back(); overfull.pop_back(); underfull.pop_back();
```
While this is impossible if the code is bug-free, I ran into this issue when I had bugs earlier in the spec, so it's not a bad idea to include it as a sanity thing.
|
|
2022-09-25 04:24:21
|
10. Section C.2.2. defines AliasMapping for a specific distribution as `AliasMapping(x)`, a function of one variable, and yet later on in Section C.2.3, you call this:
```cpp
(symbol, offset) = AliasMapping(D, index);
state = D[symbol] ร (state >> 12) + offset;
```
It's probably a bit clearer if you use `D.AliasMapping(index)`, i.e.:
```cpp
(symbol, offset) = D.AliasMapping(index);
state = D[symbol] * (state >> 12) + offset;
```
|
|
2022-09-25 04:25:12
|
11. In section C.2.6, you define `lsb = token & ((1 << config.lsb_in_token) - 1);`. I'd probably rename this variable to `low` since I think `lsb` gives the wrong implication of what this is supposed to do.
|
|
2022-09-25 04:25:18
|
12. Section 2.4 doesn't define `alphabet_size` for simple distributions, only for flat and for complex distributions. Since `alphabet_size` can be less than `table_size` this is important.
|
|
2022-09-25 04:25:49
|
These are all things I found before, but I'm reposting them from <#794206170445119489> here for clarity.
|
|
2022-10-24 03:57:05
|
Just want to double check, Extension lengths are all read before any extension payloads are read, in the extensions bundle, right?
|
|
|
yurume
|
2022-10-24 04:03:46
|
to my knowledge yes.
|
|
|
Traneptora
|
2022-10-24 03:52:55
|
I'm reading through the new spec, and here's the issues I've found so far
|
|
2022-10-24 03:59:52
|
(1) `alphabet_size` is not defined for simple distributions in section C.2.5. It's read for flat and for complex distributions. If it's supposed to be equal to `table_size` in the case of a simple distribution, then it should say so.
(2) There's still a bug in the pseudocode in section C.2.6:
```
if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i);
```
This should read:
```
if (cutoffs[i] > bucket_size) overfull.push_back(i);
else if (cutoffs[i] < bucket_size) underfull.push_back(i);
```
`underfull` shouldn't be filled if `cutoffs[i]` exactly hits the bucket size.
(3) Section C.2.1 includes two cases of `[0, num_clusters - 1)` but it should say either `[0, num_clusters)` or `[0, num_clusters - 1]`.
|
|
2022-10-24 04:01:56
|
(4) Section D.2, SizeHeader description, contains a typo. It says `!div` where it should say `!div8`
|
|
|
_wb_
|
2022-10-24 04:02:47
|
can someone double-check these proposed fixes so we don't accidentally introduce new bugs?
|
|
|
Traneptora
|
2022-10-24 04:03:03
|
Sure, I encourage a double-check
|
|
2022-10-24 04:03:17
|
although some of these are just typos
|
|
2022-10-24 04:05:05
|
Also, there's still some c-isms in the code such as `!ratio` instead of `ratio == 0`
|
|
2022-10-24 04:05:18
|
since you changed `&&` to `and` I'm not sure this is desired
|
|
2022-10-24 04:05:33
|
worth considering it
|
|
2022-10-24 04:07:44
|
Section E.2 has a formatting issue:
|
|
2022-10-24 04:07:52
|
|
|
2022-10-24 04:07:57
|
look at the ovlerap on the last row
|
|
2022-10-24 04:08:09
|
I suggest moving the `type` column leftward as there is enough empty space
|
|
2022-10-24 04:13:00
|
Section E.4.3 his missing some code tags. there's a `min(128, output_size)`, a `(p + e) & 255`, and an `i == j` that are in in textrm.
|
|
|
_wb_
|
2022-10-24 04:13:05
|
c-isms are fine, the only reason I changed `&&` to `and` is that it caused trouble when using pandoc to convert our tex sources to docx (which is the format ISO insists on, frustratingly), since pandoc misinterpreted it as two column separators
|
|
|
Traneptora
|
|
_wb_
|
2022-10-24 04:14:51
|
formatting issues are caused for the same reason, I had to drop the manual column width adjustments because they caused tables not to get parsed correctly anymore... i'll have to figure out a way to make it work in both latex and pandoc
|
|
|
Traneptora
|
2022-10-24 04:15:14
|
I see. These aren't really major issues, I'm just pointing out what I see when I see it
|
|
|
_wb_
|
2022-10-24 04:15:27
|
yeah don't stop doing that, it is useful ๐
|
|
|
Traneptora
|
2022-10-24 04:16:55
|
The spec doesn't appear to specify that `flats & kUseLfFrame` *must* be set if `frame_type == kLFFrame`
|
|
2022-10-24 04:17:21
|
Is this a requirement?
|
|
|
_wb_
|
2022-10-24 04:18:25
|
no it isn't: that flag means the _current_ frame gets its DC from some previous LfFrame
|
|
|
Traneptora
|
2022-10-24 04:18:29
|
I see
|
|
|
_wb_
|
2022-10-24 04:21:27
|
so typically if there is no recursive DC but just progressive DC encoded using modular+squeeze, you'll have first a modular frame with frame_type == kLfFrame (which doesn't even use vardct so kUseLfFrame is meaningless), and then a vardct frame with frame_type == kRegularFrame and flags & kUseLfFrame set (which means there is no DC data in the frame itself, but the previous kLfFrame gets used as DC instead)
|
|
2022-10-24 04:22:03
|
it's somewhat confusing and probably a NOTE explaining this would be good
|
|
|
Traneptora
|
2022-10-24 04:22:15
|
If `upsampling > 1` then you must divide `width = ceil(width / upsampling)` before calculating `width = ceil(width / (1 << 3 * lf_level))`, which in turn must be done before calculating `num_groups`. Both these can be present according to the spec. These operations do not commute because of the intervening `ceil` but the spec doesn't say that you have to do it in that order. (I figured it out from inspecting libjxl code).
|
|
2022-10-24 04:24:01
|
Upsampling is signaled provided that `!(flags & kUseLfFrame)` which will probably be the case for `frame_type == kLfFrame`
|
|
2022-10-24 04:24:39
|
`lf_level` is signaled as `U32(1, 2, 3, 4)` which seems like a roundabout way of saying `1 + u(2)`
|
|
2022-10-24 04:26:14
|
Same with `BlendingInfo.source` which is signaled as `U32(0, 1, 2, 3)` in the spec. why not just `u(2)`?
|
|
2022-10-24 04:28:33
|
Section F.3.3 specifies:
> Before decoding the TOC, the decoder invokes ZeroPadToByte() (B.2.7).
I'd recommend, for clarity, writing "Before decoding the TOC (but after decoding the TOC permutation, if present), the decoder invokes `ZeroPadToByte()`."
|
|
2022-10-24 04:30:19
|
Section G.2.3 has a formatting issue with `ceil(height/8)` being in textrm instead of in code
|
|
2022-10-24 04:30:23
|
same with the width
|
|
2022-10-24 04:30:39
|
This also occurs in section G.2.4 several times.
|
|
2022-10-24 04:31:11
|
`log2` should be code in section G.3.3, along with two `< 3`s for hshift and vshift.
|
|
2022-10-24 04:31:55
|
at the end of the second-to-last paragraph of section H.1, there's a `-1` that's done in textmode, not mathmode, so the `-` looks like a hyphen, not a negative one.
|
|
2022-10-24 04:35:22
|
I haven't attempted to try to skim past that but I'll point anything out that I see when I continue to write jxlatte
|
|
|
_wb_
|
|
Traneptora
(1) `alphabet_size` is not defined for simple distributions in section C.2.5. It's read for flat and for complex distributions. If it's supposed to be equal to `table_size` in the case of a simple distribution, then it should say so.
(2) There's still a bug in the pseudocode in section C.2.6:
```
if (cutoffs[i] > bucket_size) overfull.push_back(i); else underfull.push_back(i);
```
This should read:
```
if (cutoffs[i] > bucket_size) overfull.push_back(i);
else if (cutoffs[i] < bucket_size) underfull.push_back(i);
```
`underfull` shouldn't be filled if `cutoffs[i]` exactly hits the bucket size.
(3) Section C.2.1 includes two cases of `[0, num_clusters - 1)` but it should say either `[0, num_clusters)` or `[0, num_clusters - 1]`.
|
|
2022-10-24 04:39:17
|
why does `alphabet_size` need to be defined in that case? that variable is not referenced then, or is it?
|
|
|
Traneptora
|
|
_wb_
why does `alphabet_size` need to be defined in that case? that variable is not referenced then, or is it?
|
|
2022-10-24 08:08:58
|
it's used when you generate the alias map
|
|
2022-10-24 08:12:03
|
oh, right. I see, because you define `D[i]` outside of alphabetSize to be zero.
|
|
2022-10-24 08:12:08
|
Okay, I see, that makes sense now.
|
|
2022-10-25 07:04:52
|
Section H.2 mentions this right at the end:
|
|
2022-10-25 07:04:53
|
> The decoder then starts an ANS stream (C.1)
|
|
2022-10-25 07:05:07
|
does it actually *have* to be ANS? or can it be a prefix-code distribution?
|
|
2022-10-25 07:54:44
|
Section H.6.3 Squeeze transform appears to have an etraneous `-1` in the psuedo-code:
|
|
2022-10-25 07:54:48
|
> `for (i = 0; i < sp.size() - 1; i++) {`
|
|
2022-10-25 07:54:57
|
I believe this should say `i < sp.size();`
|
|
2022-10-25 07:55:50
|
otherwise the final element of `sp` doesn't do anything
|
|
|
_wb_
|
2022-10-25 08:08:41
|
Right.
|
|
|
Traneptora
does it actually *have* to be ANS? or can it be a prefix-code distribution?
|
|
2022-10-25 08:09:11
|
It can be either. Maybe we need a word for that...
|
|
|
Traneptora
|
|
_wb_
It can be either. Maybe we need a word for that...
|
|
2022-10-25 10:49:16
|
Entropy-coded stream?
|
|
2022-10-25 10:49:28
|
Section C.1 is linked anyway
|
|
|
_wb_
|
2022-10-26 09:23:59
|
<@795684063032901642>
|
|
|
Traneptora
|
2022-10-27 06:38:50
|
Section H.5.2, top of page 52:
`4 + (maxweight ร ((1 << 24) Idiv ((err_sum >> shift) + 1))) >> shift;`
this is missing parentheses, should be:
`4 + ((maxweight ร ((1 << 24) Idiv ((err_sum >> shift) + 1))) >> shift);`
|
|
2022-10-27 06:39:14
|
`+` is grabbier than `>>` in C.
|
|
2022-10-27 06:39:25
|
aka it has higher precedence
|
|
2022-10-27 06:41:37
|
Same section, same page:
`subpred[3] = N3 - (((true_err_NW ร wp_p3a + true_err_N ร wp_p3b +
true_err_NE ร wp_p3c + (NN3 - N3) ร wp_p3d + (NW3 - W3) ร wp_p3e) >> 5);`
|
|
2022-10-27 06:41:46
|
this has an extra opening `(` right after `N3 -`
|
|
|
_wb_
|
2022-10-27 08:10:18
|
we will probably switch from LaTeX to this: https://www.metanorma.org/
|
|
2022-10-27 08:12:40
|
which might introduce some new spec bugs but it will give us a better way to produce Word files the way ISO wants them while still having a text-based spec with version control so we don't have to deal with Word to do the editing
|
|
|
improver
|
2022-10-27 08:20:58
|
looks good & designed exactly for this purpose
|
|
|
Traneptora
|
2022-10-27 09:26:46
|
section H.5.1 in the definition of `err[i]` the parentheses are misplcaced
|
|
2022-10-27 09:27:03
|
It *says*
|
|
2022-10-27 09:27:05
|
`abs((subpred[i] - (true_value << 3)) + 3) >> 3`
|
|
2022-10-27 09:27:07
|
it should be
|
|
2022-10-27 09:27:31
|
`(abs(subpred[i] - (true_value << 3)) + 3) >> 3`
|
|
2022-10-27 09:27:46
|
that is, you are supposed to add 3 after taking the absolute value of the difference
|
|
2022-10-27 11:58:55
|
Section J.1, it states the condition for `gab_custom` to be signaled is `!all_default && gab`
|
|
2022-10-27 11:59:05
|
this is actually not true, it's just `gab`
|
|
2022-10-27 11:59:21
|
so `gab_custom` will be signaled when `all_default == true`
|
|
|
|
veluca
|
2022-10-27 12:04:20
|
๐คฆโโ๏ธ
|
|
|
_wb_
|
2022-10-28 09:43:37
|
if anyone feels like comparing this metanorma version of the spec (in html) to the latex version (in pdf) to see if there's anything that got lost in translation, feel free!
|
|
|
Traneptora
|
2022-10-28 11:07:29
|
I dislike Cambria as a font choice but otherwise I don't notice anything immediate
|
|
2022-10-28 11:10:04
|
Table H.2 could use better rendering
|
|
2022-10-28 11:10:18
|
as it's not really a table
|
|
|
_wb_
|
2022-10-28 11:14:47
|
Haha yes Cambria is a stupid font but what makes you think we have any choice in that? ๐
|
|
2022-10-28 11:15:35
|
ISO forces us to use Cambria for the main text and Courier New for the source code snippets, even though both of those are ugly and there are nicer fonts
|
|
|
improver
|
2022-10-28 03:47:55
|
can doc tooling do different versions with different fonts?
|
|
2022-10-28 03:48:22
|
jxl spec the community edition (good fonts) vs iso edition (cambria)
|
|
|
_wb_
|
2022-10-28 05:06:39
|
haha sure
|
|
2022-10-28 07:03:46
|
I migrated part 2 to asciidoc too (until now we just used Word for that since the pain was doable, it's a smaller spec)
|
|
2022-10-28 07:04:13
|
and then I made the above part 1+2 community edition ๐
|
|
|
Traneptora
so `gab_custom` will be signaled when `all_default == true`
|
|
2022-10-28 07:25:28
|
damn we missed an opportunity to save 1 header bit!
|
|
|
Traneptora
|
2022-10-28 07:50:48
|
It won't end up mattering without a permuted TOC as it gets eaten by ZeroPadToByte
|
|
|
_wb_
|
2022-10-28 09:53:16
|
current versions of part 1 and part 2 drafts for 2nd edition
|
|
|
Traneptora
|
2022-10-29 09:30:51
|
Section K.3.1:
|
|
2022-10-29 09:31:04
|
> `blending`: arrays of `count` blend mode information structures, which consists of arrays of mode, alpha_channel and clamp
|
|
2022-10-29 09:31:15
|
Later, when blending is populated, the loop looks like this:
|
|
2022-10-29 09:31:29
|
> `for (k = 0; k < num_extra_channels+1; k++) {`
|
|
2022-10-29 09:31:40
|
These numbers aren't necessarily the same.
|
|
2022-10-29 09:32:01
|
How big is `blending[]`? is it one per `count` or is it one per `num_extra_channels + 1`
|
|
2022-10-30 02:32:22
|
The spec doesn't say what order the passgroups are in
|
|
2022-10-30 02:32:28
|
do you let `pass = 0` and then do *all* the groups?
|
|
2022-10-30 02:32:34
|
and then let `pass = 1` and do all the groups?
|
|
2022-10-30 02:32:34
|
etc.
|
|
2022-10-30 04:05:00
|
ah it does say, nvm.
|
|
2022-10-31 07:41:25
|
The Tendency function in the section on squeeze is incorrect
|
|
2022-10-31 07:42:03
|
It says this:
```c
tendency(A, B, C) {
X = (4 * A โ 3 * C โ B + 6) Idiv 12;
if (A >= B and B >= C) {
if (X โ (X & 1) > 2 * (A โ B)) X = 2 * (A โ B) + 1;
if (X + (X & 1) > 2 * (B โ C)) X = 2 * (B โ C);
return X;
} else if (A <= B and B <= C) {
if (X + (X & 1) < 2 * (A โ B)) X = 2 * (A โ B) โ 1;
if (X โ (X & 1) < 2 * (B โ C)) X = 2 * (B โ C);
return X;
} else return 0;
}
```
|
|
2022-10-31 07:42:53
|
This is in libjxl/squeeze.h:
|
|
2022-10-31 07:43:00
|
```c++
inline pixel_type_w SmoothTendency(pixel_type_w B, pixel_type_w a,
pixel_type_w n) {
pixel_type_w diff = 0;
if (B >= a && a >= n) {
diff = (4 * B - 3 * n - a + 6) / 12;
// 2C = a<<1 + diff - diff&1 <= 2B so diff - diff&1 <= 2B - 2a
// 2D = a<<1 - diff - diff&1 >= 2n so diff + diff&1 <= 2a - 2n
if (diff - (diff & 1) > 2 * (B - a)) diff = 2 * (B - a) + 1;
if (diff + (diff & 1) > 2 * (a - n)) diff = 2 * (a - n);
} else if (B <= a && a <= n) {
diff = (4 * B - 3 * n - a - 6) / 12;
// 2C = a<<1 + diff + diff&1 >= 2B so diff + diff&1 >= 2B - 2a
// 2D = a<<1 - diff + diff&1 <= 2n so diff - diff&1 >= 2a - 2n
if (diff + (diff & 1) < 2 * (B - a)) diff = 2 * (B - a) - 1;
if (diff - (diff & 1) < 2 * (a - n)) diff = 2 * (a - n);
}
return diff;
}
```
|
|
2022-10-31 07:43:09
|
Notice that `diff` is calculated differently based on the branches
|
|
|
_wb_
|
2022-10-31 09:22:38
|
๐คฆโโ๏ธ
|
|
|
Moritz Firsching
|
2022-11-01 01:48:04
|
good catch, made a pr to fix it in the spec (I think the code is correct...)
|
|
|
_wb_
|
2022-11-01 04:36:05
|
does it actually make a difference? I wonder why the spec didn't just copy the code
|
|
|
|
veluca
|
2022-11-01 05:32:17
|
yeah it changes rounding
|
|
|
Traneptora
|
2022-11-01 06:00:35
|
it makes it so diff is at most off-by-one here
|
|
2022-11-01 06:00:51
|
+6 vs -6 is always is always different by exactly 12, so when dividing by 12 it's different by exactly 1
|
|
2022-11-01 06:01:14
|
which affects the parity, i.e. `(diff & 1)`
|
|
2022-11-01 06:01:59
|
diff ends up 1 smaller than it was
|
|
2022-11-01 06:02:19
|
so if diff was even, then `diff + (diff & 1)` will end up being the same
|
|
2022-11-01 06:02:31
|
but if diff was odd, then `diff + (diff & 1)` is two lower than it was
|
|
2022-11-01 06:02:36
|
which potentially affects the thresholds
|
|
2022-11-01 06:03:52
|
it also slightly affects the output of the squeeze transform because `buffer[x][2 * y + 1] = first - diff`
|
|
2022-11-01 06:03:56
|
whose parity matters
|
|
2022-11-01 06:04:13
|
it's probably not exceptionally noticeable but it does affect it slightly
|
|
|
Moritz Firsching
good catch, made a pr to fix it in the spec (I think the code is correct...)
|
|
2022-11-01 06:04:29
|
whether or not it *was* correct, it is now, I suppose <:kek:857018203640561677>
|
|
2022-11-01 06:04:36
|
because libjxl in some ways defines the bitstream
|
|
|
|
veluca
|
2022-11-01 06:42:23
|
the C++ version makes sense in that it's [positive]+6 or [negative]-6, i.e. adds "away from 0"
|
|
2022-11-01 06:42:52
|
rn I can't figure out what that means for the division
|
|
|
Traneptora
|
2022-11-01 06:44:15
|
integer division rounds down, toward negative infinity
|
|
2022-11-01 06:45:21
|
er wait
|
|
2022-11-01 06:45:21
|
interesting
|
|
2022-11-01 06:45:35
|
it's toward 0 in C, but it's toward -infinity in python
|
|
2022-11-01 06:45:36
|
how about that
|
|
2022-11-01 06:46:01
|
```
$ cat test.c && gcc test.c && ./a.out
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%d\n", -13 / 12);
return 0;
}
-1
```
|
|
2022-11-01 06:46:46
|
It's right shift that rounds toward minus infinity
|
|
2022-11-01 06:46:53
|
```c
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%d, %d\n", -13 / 8, -13 >> 3);
return 0;
}
```
|
|
2022-11-01 06:47:05
|
this outputs `-1, -2`
|
|
2022-11-01 06:47:18
|
well strictly speaking, right shifting a signed negative integer is undefined behavior in C
|
|
2022-11-01 06:47:35
|
on x86 it sign-extends
|
|
2022-11-01 06:48:08
|
but the spec should probably say that when right shifting a negative-integer, it is assumed that the extra bits are populated by the sign bit
|
|
2022-11-01 06:48:25
|
it might say that but if it doesn't it should, as it's Undefined Behavior in C
|
|
2022-11-01 06:49:07
|
I ran into this issue with the weighted predictor because there's a number of right shifts
|
|
2022-11-01 06:49:21
|
which cannot be replaced with integer divisions
|
|
2022-11-01 06:49:57
|
luckily the JVM has two operators, `>>` and `>>>` for signed ints, which are fully specified: `>>` sign-extends and `>>>` zero-extends
|
|
|
Moritz Firsching
|
|
Traneptora
whether or not it *was* correct, it is now, I suppose <:kek:857018203640561677>
|
|
2022-11-01 07:29:49
|
that's the spirit...
|
|
|
Traneptora
|
2022-11-01 08:14:09
|
So at this point I'm confused. I have a modular stream that's being decoded identically to libjxl
|
|
2022-11-01 08:14:35
|
gaborish is disabled in the frame header, as is epf
|
|
2022-11-01 08:15:17
|
many of the values in this modular stream are out of range. the BitDepth header in the image header field tells me that `exp_bits = 0` and `bits = 8`
|
|
2022-11-01 08:16:39
|
There is exactly one frame in the image
|
|
2022-11-01 08:17:02
|
but simply clamping these values between 0 and 255 yields the wrong image
|
|
2022-11-01 08:17:25
|
what is libjxl doing that lets it take this modular stream and reconstruct it properly?
|
|
2022-11-01 08:17:42
|
`factor` in libjxl is `1f / 255f`
|
|
2022-11-01 08:18:02
|
And whatever it is doing, why is this not described in the spec?
|
|
2022-11-01 08:19:00
|
patches, splines, and noise are all disabled.
|
|
2022-11-01 08:19:32
|
XYB is false, as is YCbCr.
|
|
2022-11-01 08:20:54
|
oh *fuck* it *IS* xyb encoed. holy crap, ignore me, lmao.
|
|
2022-11-01 09:25:47
|
Okay, here's something. The default values of `m_x_lf_unscaled` are 4096, in table G.2, but I'm pretty sure it should be `1/4096`
|
|
2022-11-01 09:26:15
|
since later in Table L.2.2 it says `X = Xโ * m_x_lf_unscaled`
|
|
2022-11-01 09:26:28
|
and I believe X is supposed to be divided by 4096 here, not multiplied
|
|
2022-11-01 09:54:54
|
OH!
|
|
2022-11-01 09:55:12
|
^ not only is this true, but the spec fails to mention that they're supposed to be divided by 128.0f
|
|
2022-11-01 09:55:17
|
if not all default
|
|
2022-11-01 09:56:02
|
```c++
Status DequantMatrices::DecodeDC(BitReader* br) {
bool all_default = br->ReadBits(1);
if (!br->AllReadsWithinBounds()) return JXL_FAILURE("EOS during DecodeDC");
if (!all_default) {
for (size_t c = 0; c < 3; c++) {
JXL_RETURN_IF_ERROR(F16Coder::Read(br, &dc_quant_[c]));
dc_quant_[c] *= 1.0f / 128.0f;
// Negative values and nearly zero are invalid values.
if (dc_quant_[c] < kAlmostZero) {
return JXL_FAILURE("Invalid dc_quant: coefficient is too small.");
}
inv_dc_quant_[c] = 1.0f / dc_quant_[c];
}
}
return true;
}```
|
|
2022-11-01 09:56:14
|
specifically the 1.0f / 128f
|
|
|
_wb_
|
2022-11-01 10:04:03
|
oh damn, so many places where these quantizers get rescaled or inverted, so easy to get lost in that
|
|
|
Traneptora
|
2022-11-01 10:04:39
|
Yea, they're encoded *in* the bitstream as 1f / foo
|
|
2022-11-01 10:04:46
|
The actual behavior is as follow:
|
|
2022-11-01 10:05:00
|
`m_lf_x_unscaled` defaults to `1 / 4096`
|
|
2022-11-01 10:05:10
|
instead, if it's in the bitstream, you just let `m_lf_x_unscaled = F16() / 128f`
|
|
2022-11-01 10:05:19
|
It's already in the bistream in the denominator mode
|
|
2022-11-01 10:06:08
|
This means that if it's encoded in the bistream as `1/256f` then it'll end up as `1/32768f` when multiplied by `X'`
|
|
2022-11-01 10:06:18
|
Does that make sense?
|
|
2022-11-01 10:06:41
|
don't divide the default values by 1/128, those are baked in.
|
|
|
yurume
|
2022-11-02 04:05:51
|
I thought I reported these issues about quantizers, weren't them incorporated into the current draft?
|
|
|
Traneptora
|
2022-11-02 04:43:08
|
must have been missed
|
|
|
yurume
|
2022-11-02 04:46:44
|
or I forgot to report them... let me check
|
|
2022-11-02 04:47:24
|
oh, my bad, that was indeed the case
|
|
2022-11-02 04:47:47
|
<@853026420792360980> maybe you want to keep eyes on "TODO spec" comments in J40
|
|
|
Traneptora
|
2022-11-02 04:48:01
|
I'm just reporting them as I go
|
|
2022-11-02 04:48:05
|
this channel didn't exist at the time you made tho spec comments
|
|
2022-11-02 04:48:16
|
btw, you have a bug in j40
|
|
|
yurume
|
2022-11-02 04:48:35
|
I think I did mention them back when I was talking in <#794206087879852106> , but I must have forgotten to report them formally
|
|
|
Traneptora
|
2022-11-02 04:48:38
|
in predictor 8 you have `p.w - (p.ww + p.nw - p.ne)` or something
|
|
2022-11-02 04:49:05
|
which is not the same thing as `p.w + p.n - p.nee` or whatever it is, evaluated one pixel over
|
|
2022-11-02 04:49:50
|
most of the time it's the same, but `if x = 1`, then `p.ww` evaluates to `p.w`, but `p.w` one pixel to the left evalutes to `p.nw`
|
|
2022-11-02 04:50:24
|
so in j40 you need an `if x == 1` in predictor 8
|
|
|
yurume
|
2022-11-02 04:53:14
|
so is J40 diverging from libjxl in that case?
|
|
2022-11-02 04:53:58
|
I think I have taken extra care to handle corner cases (there are *three* sets of them, I recall) but I may have missed one
|
|
|
Traneptora
|
2022-11-02 05:28:03
|
in that specific case, yea
|
|
|
_wb_
|
2022-11-02 06:31:48
|
Edge cases are so annoying
|
|
|
yurume
|
2022-11-02 08:12:16
|
I'm gonna take some time to document all remaining spec bugs that are only in the J40 comments today
|
|
2022-11-02 08:12:52
|
cause sooner or later, someone (in this case thebombzen) will be hit by those bugs anyway
|
|
2022-11-02 08:13:27
|
and available bug reports are better than thorough bug reports
|
|
|
Traneptora
|
2022-11-02 08:28:46
|
I'm currently confused about how you're supposed to crop the LF groups if the image isn't a perfect power of 2
|
|
2022-11-02 08:29:20
|
in the case of Pass Groups since they (mostly) correspond 1-1 to pixels, you just crop the group by the amount it overflows
|
|
2022-11-02 08:30:53
|
I have a 3264x2448 image, which has four LF groups
|
|
2022-11-02 08:31:03
|
3264/8 is 408
|
|
2022-11-02 08:31:17
|
and 2448/8 is 306
|
|
2022-11-02 08:31:52
|
so I'd assume that LF Groups would be cropped if they overflowed, i.e. 408 - 256 = 152
|
|
2022-11-02 08:31:59
|
but somehow that isn't working
|
|
2022-11-02 08:32:28
|
and I'm not entirely sure why
|
|
|
yurume
|
2022-11-02 08:33:37
|
I'm not sure if this is documented or not, but as I understand, you first round dimension to next multiples of 8; this means that there are some blocks where some edges are left unused.
|
|
|
Traneptora
|
2022-11-02 08:34:09
|
everything here is a multiple of 8 though, 408, 256, and 152
|
|
|
yurume
|
2022-11-02 08:34:27
|
for groups and LF groups, assuming you've rounded dimensions accordingly, they have no further restrictions; any integral number in width and height (in terms of blocks) should work
|
|
2022-11-02 08:34:49
|
ah wait
|
|
2022-11-02 08:34:59
|
~~sorry, my comment should only apply to *groups*, not LF groups~~ I rechecked my code and I think this applies both to groups and LF groups
|
|
|
Traneptora
|
2022-11-02 08:35:52
|
Yea, I figured out Pass Groups
|
|
2022-11-02 08:36:35
|
```
Index Width Height x0 y0
0, 128, 256, 0, 0
0, 64, 128, 0, 0
0, 64, 128, 0, 0
0, 256, 256, 0, 0
0, 128, 128, 0, 0
0, 128, 128, 0, 0
0, 128, 256, 0, 0
0, 128, 256, 0, 0
0, 256, 256, 0, 0
0, 256, 256, 0, 0
1, 128, 256, 128, 0
1, 64, 128, 64, 0
1, 64, 128, 64, 0
1, 152, 256, 256, 0
1, 128, 128, 128, 0
1, 128, 128, 128, 0
1, 128, 256, 128, 0
1, 128, 256, 128, 0
1, 152, 256, 256, 0
1, 152, 256, 256, 0
```
|
|
2022-11-02 08:36:53
|
I get through the channel list like so, and then I hit a TOC boundary
|
|
2022-11-02 08:37:19
|
I've discovered through debugging that overrunning a TOC boundary typically means that you are decoding the wrong thing
|
|
|
yurume
|
2022-11-02 08:37:30
|
so in your case (assuming group size of 256 and LF group size of 2048), you have four LF groups of dimensions 2048x2048, 1216x2048, 2048x400, 1216x400 respectively
|
|
2022-11-02 08:38:04
|
which are 256x256, 152x256, 256x50, 152x50 in terms of block counts
|
|
|
Traneptora
|
2022-11-02 08:38:22
|
that's what it appears to be decoding
|
|
2022-11-02 08:38:34
|
but then it barfs. it's possible that there's a bug somewhere else though that's causing it to barf
|
|
|
yurume
|
2022-11-02 08:39:14
|
there would be 130 groups, where first 8 groups are the topmost groups in the first LF group, next 5 groups are for the topmost groups in the second LF group, next 8 groups are for the first LF group, next 5 are for the second LF group and so on
|
|
|
Traneptora
|
2022-11-02 08:39:28
|
yea, and the passgroups are decoding fine
|
|
2022-11-02 08:39:31
|
all 130 of them
|
|
2022-11-02 08:39:38
|
the issue is the LF groups
|
|
|
yurume
|
2022-11-02 08:39:50
|
can you pinpoint where you are stuck?
|
|
|
Traneptora
|
2022-11-02 08:40:04
|
I'm trying to figure out if I'm calculating the LF dimensions incorrectly
|
|
|
yurume
|
2022-11-02 08:40:23
|
cause I thought if you've done groups LF groups should follow naturally...
|
|
|
Traneptora
|
2022-11-02 08:40:23
|
I may just have a bug elsewhere for all I know though.
|
|
|
yurume
|
2022-11-02 08:41:23
|
I believe your code does recognize there are 4 LF groups, otherwise TOC wouldn't work, right?
|
|
2022-11-02 08:41:50
|
then I believe you are going to decode the LfGroup bundle
|
|
|
Traneptora
|
2022-11-02 08:42:06
|
absolutely
|
|
2022-11-02 08:42:34
|
OH
|
|
2022-11-02 08:42:42
|
stream index was wrong
|
|
|
yurume
|
2022-11-02 08:42:43
|
which consists of ModularLfGroup (for LQIP I believe), LF coefficients and HF metadata where the last two are for VarDCT; which one are you stuck at?
|
|
2022-11-02 08:42:46
|
OH
|
|
|
Traneptora
|
2022-11-02 08:42:46
|
causing the MA tree to collapse
|
|
2022-11-02 08:42:48
|
it was a bug elsewhere
|
|
|
yurume
|
2022-11-02 08:43:14
|
that's pretty understandable
|
|
|
Traneptora
|
2022-11-02 08:43:29
|
```
for ModularLfGroup: 1 + num_lf_groups + LF group index;
```
<@794205442175402004>
This is incorrect, it's supposed to be `1 + LF Group index`
|
|
2022-11-02 08:43:39
|
you don't add the number of LF groups to the ModularLfGroup stream index
|
|
2022-11-02 08:43:51
|
I'm not sure what the LF Coefficients and HF Metadata are supposed to be though
|
|
|
yurume
|
|
Traneptora
|
2022-11-02 08:44:32
|
but the stream index was incorrect
|
|
|
yurume
|
2022-11-02 08:44:34
|
I'm checking that part of spec, it used to be pretty much wrong
|
|
|
Traneptora
|
2022-11-02 08:44:36
|
as soon as I changed that it decoded fine
|
|
|
yurume
|
2022-11-02 08:44:39
|
and I think I've checked that
|
|
|
Traneptora
as soon as I changed that it decoded fine
|
|
2022-11-02 08:45:12
|
do your image have ModularLfGroup?
|
|
|
Traneptora
|
2022-11-02 08:45:22
|
Yes, this is modular.
|
|
2022-11-02 08:45:28
|
I haven't started working on VarDCT yet.
|
|
|
yurume
|
2022-11-02 08:45:35
|
I mean, ModularLfGroup is normally empty if you don't have squeeze
|
|
2022-11-02 08:45:41
|
so I guess that's why I missed that part
|
|
|
Traneptora
|
2022-11-02 08:45:43
|
this is lossy modular, as a test.
|
|
2022-11-02 08:45:49
|
I have squeeze
|
|
|
yurume
|
2022-11-02 08:46:16
|
yeah, ModularLfGroup only appears when you have hshift or vshift of at least 3
|
|
2022-11-02 08:46:20
|
which doesn't happen if you don't have squeeze
|
|
2022-11-02 08:47:20
|
so the question is: normally we are supposed to have different (fragments of) MA tree for LF coefficients, ModularLfGroup and HF metadata
|
|
2022-11-02 08:47:37
|
because of course these three may have different optimal MA trees
|
|
2022-11-02 08:48:07
|
but as it stands, there is no way in an MA tree to distinguish LF coefficients from ModularLfGroup
|
|
2022-11-02 08:48:55
|
so is there any impact of this bug?
|
|
|
Traneptora
|
|
yurume
but as it stands, there is no way in an MA tree to distinguish LF coefficients from ModularLfGroup
|
|
2022-11-02 08:49:33
|
stream index, property 1
|
|
|
yurume
|
2022-11-02 08:50:07
|
but stream indices for LF coefficients *are* already `1 + LF group index`, which I believe is correct because otherwise J40 wouldn't work
|
|
|
Traneptora
|
2022-11-02 08:50:29
|
... huh.
|
|
|
yurume
|
2022-11-02 08:50:50
|
yeah, I've checked J40 and it is indeed `1 + LF group index` (or `sidx0`, as J40 says)
|
|
2022-11-02 08:51:12
|
`sidx1` (`1 + num_lf_groups + LF group index`) was reserved for ModularLfGroup which is not yet implemented
|
|
2022-11-02 08:51:48
|
I'll check libjxl to be sure
|
|
|
Traneptora
|
2022-11-02 08:51:52
|
wait, hm.
|
|
2022-11-02 08:51:57
|
Looks like illegal final ans state
|
|
|
yurume
|
2022-11-02 08:52:05
|
maybe my test images just didn't have stream index nodes
|
|
|
Traneptora
|
2022-11-02 08:52:12
|
so maybe I'm just decoding garbage
|
|
2022-11-02 08:53:04
|
ah, I was just decoding garbage
|
|
2022-11-02 08:53:05
|
that's mb
|
|
2022-11-02 08:53:20
|
I set it back to what it had been, and the first LF Group verified successfully
|
|
2022-11-02 08:53:26
|
ignore me, bug not happened.
|
|
|
yurume
|
2022-11-02 08:54:08
|
ah, good
|
|
2022-11-02 08:54:14
|
so that part of spec seems okay
|
|
|
Traneptora
|
2022-11-02 08:54:37
|
I'm still confused about the LF group sizes though
|
|
2022-11-02 08:54:54
|
where am I supposed to truncate the LF Groups when they overflow out of the side of the image?
|
|
|
_wb_
|
2022-11-02 08:55:35
|
I really appreciate it that we now have two independent implementers sorting out potential spec issues amongst themselves, by the way. This will lead to a 2nd edition of the spec that will be so much better than the 1st edition. Thanks so much!
|
|
|
Traneptora
|
2022-11-02 08:55:51
|
I have a 408/306 image after dividing by 8, and the LF groups then should be variations of 256x256
|
|
2022-11-02 08:56:03
|
so I'm producing a 256x256 upper left corner
|
|
2022-11-02 08:56:10
|
152x256 upper right
|
|
2022-11-02 08:56:27
|
but upon decoding that second one I overrun the TOC buffer, which usually means I'm decoding the wrong thing
|
|
|
_wb_
|
2022-11-02 08:56:35
|
Yes, should be same splitting logic as the hf groups
|
|
|
Traneptora
|
2022-11-02 08:56:55
|
so in this specific case, 152x50 for the lower right is correct, right?
|
|
2022-11-02 08:57:20
|
if so, I must have a bug elsewhere
|
|
|
_wb_
|
2022-11-02 08:57:52
|
Overrunning toc is indeed a sign of a bug, no encoder produces overlapping sections and we should probably forbid it
|
|
|
Traneptora
|
2022-11-02 08:58:09
|
no I mean I hard-isolate the TOC into buffers because it makes that easier to detect
|
|
|
_wb_
|
2022-11-02 08:58:40
|
Same thing, no section should read more bytes than the length indicated in toc
|
|
|
Traneptora
|
2022-11-02 08:58:50
|
I thought that was pretty self-evident
|
|
2022-11-02 08:58:54
|
at least
|
|
2022-11-02 08:59:20
|
but either way, if those dimensions are correct then my bug is elsewhere
|
|
|
_wb_
|
2022-11-02 09:01:04
|
The dimensions are correct โ you rounded up when dividing by 8, right?
|
|
|
Traneptora
|
2022-11-02 09:01:15
|
it's a multiple of 8, so no, but....
|
|
|
_wb_
|
2022-11-02 09:02:14
|
Must be bug elsewhere then
|
|
|
Traneptora
|
2022-11-02 09:02:33
|
gonna do my usual trick of dump output of libjxl and diff it to see where it differs
|
|
2022-11-02 09:33:02
|
and, I figured it out, I wasn't cropping the corresponding higher-shift channels either
|
|
2022-11-02 09:34:08
|
so the 256x256 channels needed to be cropped to 152x256, but the 128x256 channels needed to be cropped to 76x256
|
|
2022-11-02 10:19:57
|
ugh, how do you handle it if these are *odd*
|
|
|
Kleis Auke
|
|
Traneptora
gonna do my usual trick of dump output of libjxl and diff it to see where it differs
|
|
2022-11-03 01:44:43
|
Just wondering, do you use `jxlinfo` for this?
|
|
2022-11-03 01:53:06
|
(oh, I misread, it's probably `printf()`-debugging)
|
|
|
Traneptora
|
2022-11-03 02:27:19
|
yea, I printf to stdout and redirect
|
|
2022-11-03 02:30:01
|
basically I just straight up `printf("%zu, %zu, %d\n", x, y, p[x])` or something
|
|
2022-11-03 02:30:03
|
and compare the output
|
|
|
|
veluca
|
2022-11-03 02:34:50
|
most useful compression debugging technique *ever*
|
|
|
andrew
|
2022-11-04 01:18:20
|
I just started glancing at the spec - pseudocode operators are defined as being the same as those in C++, but signed integer overflow is undefined in C++. Is this intentional?
|
|
|
yurume
|
2022-11-04 01:43:33
|
all operators are defined as if there is an integral type of infinite size (there should be a mention), so it should never overflow
|
|
2022-11-04 01:43:56
|
in reality, of course, the implementer should be aware that the calculation can overflow
|
|
|
andrew
|
2022-11-04 01:52:31
|
yeah I don't see such a mention
|
|
|
Traneptora
|
|
andrew
yeah I don't see such a mention
|
|
2022-11-04 02:13:02
|
> Expressions and variables of which types are omitted, are understood as real numbers.
|
|
2022-11-04 02:13:04
|
section 4.4
|
|
2022-11-04 02:15:11
|
speaking of which
|
|
2022-11-04 02:15:14
|
|
|
2022-11-04 02:15:24
|
Section 4.3, missing a closing paren `)`
|
|
|
andrew
I just started glancing at the spec - pseudocode operators are defined as being the same as those in C++, but signed integer overflow is undefined in C++. Is this intentional?
|
|
2022-11-04 04:18:38
|
mostly* defined as being the same in C/C++ btw
in C, right shifting a negative signed integer is undefined behavior
|
|
2022-11-04 04:19:08
|
but in the spec is clearly defined. in practice right shifting signed integers will sign-extend on x86
|
|
2022-11-04 04:19:16
|
which is also what happens in the spec
|
|
2022-11-05 04:00:36
|
Section K.3.1
|
|
2022-11-05 04:00:44
|
```
patch[i].x[j] = UnpackSigned(DecodeHybridVarLenUint(5)) + patch[i].x[j โ 1];
patch[i].y[j] = UnpackSigned(DecodeHybridVarLenUint(5)) + patch[i].y[j โ 1];
```
|
|
2022-11-05 04:00:54
|
these should say `DecodeHybridVarLenUint(6)`
|
|
2022-11-05 04:00:56
|
the context is wrong
|
|
2022-11-05 04:01:21
|
note that in `patch_dictionary_internal.h`, it lists ` kPatchOffsetContext = 6,` on line 22
|
|
2022-11-05 04:01:52
|
and in `dec_patch_dictionary.cc` on line 110 we have this:
```c++
pos.x =
positions_.back().x + UnpackSigned(read_num(kPatchOffsetContext));
pos.y =
positions_.back().y + UnpackSigned(read_num(kPatchOffsetContext));
```
|
|
2022-11-05 04:02:33
|
furthermore I tested this against `patch_lossless.jxl` conformance sample and the final ANS state fails (i.e. != 0x13000) if you use the 5 in the spec, but it succeeds if you use 6
|
|
|
andrew
|
2022-11-05 08:19:52
|
I'm trying to understand the examples in B.2.2 (U32)
|
|
2022-11-05 08:20:20
|
for the first example, you read two bits, 0, then 1, which is 0b10 = 2, which results in 32
|
|
2022-11-05 08:21:12
|
following the same bit ordering for the second example, you read two bits, 1, 1, which is 0b11 = 3, so you then read 8 more bits, 1, 0, 1, 0, 0, 0, 0, 0, which is 0b00000101 = 5 != 7
|
|
2022-11-05 08:21:54
|
or am I getting the bit ordering for the u(n) operation wrong
|
|
2022-11-05 09:07:32
|
okay, I think this is a typo in the spec, I just yanked out jxlatte's bit reader (because it was convenient) and the second example returned 5
|
|
|
Traneptora
|
|
andrew
following the same bit ordering for the second example, you read two bits, 1, 1, which is 0b11 = 3, so you then read 8 more bits, 1, 0, 1, 0, 0, 0, 0, 0, which is 0b00000101 = 5 != 7
|
|
2022-11-05 09:39:16
|
It's in little endian order
|
|
2022-11-05 09:39:36
|
010111 means first you read two bits-> 01, with 0111 remaining
|
|
2022-11-05 09:39:48
|
then you look at those two bits
|
|
2022-11-05 09:40:39
|
01 is in *little-endian* mode so that's actually 0b10
|
|
2022-11-05 09:41:07
|
which is 2
|
|
2022-11-05 09:42:04
|
hm, yea it's definitely a typo in the spec
|
|
2022-11-05 09:42:22
|
at that point it should pick u(6)
|
|
2022-11-05 09:42:29
|
but there's only 4 bits remaining
|
|
2022-11-05 09:42:50
|
it should say `101110`
|
|
2022-11-05 09:43:11
|
if it had said 101110, then you'd pick 0b10, with 0b1110 remaining
|
|
2022-11-05 09:43:39
|
then that would go to 1, b/c little endian, and then go to u(4), reading 0b1110, little endian, equals 7
|
|
2022-11-05 09:44:23
|
so it should say `U32(u(2), u(4), u(6), u(8)): 0b101110 -> value = 7`
|
|
2022-11-05 09:44:49
|
and it should say `U32(8, 16, 32, u(7)): 0b01 -> 32`
|
|
2022-11-05 09:44:55
|
someone messed up endianness
|
|
2022-11-05 09:45:17
|
good catch!
|
|
|
andrew
|
|
Traneptora
so it should say `U32(u(2), u(4), u(6), u(8)): 0b101110 -> value = 7`
|
|
2022-11-05 09:52:32
|
or should this be 0b011101?
|
|
2022-11-05 09:53:26
|
as in the first digit is most significant, as it is in most programming languages
|
|
|
Traneptora
|
2022-11-05 09:56:08
|
I'm using it in little-endian order. so if it occurs in the bitstream as 101110 then it would end up being 011101 if you wanted to populate it in a programming language, yes
|
|
2022-11-06 04:32:42
|
The spec doesn't specify what happens if you have a property whose value is out of range
|
|
2022-11-06 04:33:42
|
the correct behavior is that for properties >15, the property defaults to 0, unless computed as in the bottom of section H.4.1.
|
|
2022-11-06 04:33:58
|
The spec doesn't mention this, though.
|
|
|
_wb_
|
2022-11-06 04:52:31
|
ah, good point
|
|
|
yurume
|
2022-11-06 04:55:54
|
ugh, I thought it was simply an incorrect bitstream
|
|
|
_wb_
|
2022-11-06 04:58:50
|
no it's not, e.g. you could use properties that refer to the previous channel in a global tree that also gets used for the very first channel
|
|
|
Traneptora
|
2022-11-06 05:11:54
|
I ran into this issue when attempting to decode this JXL art piece: https://discord.com/channels/794206087879852103/824000991891554375/1038632475372560436
|
|
2022-11-06 05:12:01
|
channel 0 of the stream has property19
|
|
|
andrew
|
2022-11-06 09:27:15
|
In C.2.1, what exactly does `lz_cfg` refer to? is it a part of the entropy decoder state?
|
|
|
_wb_
|
2022-11-06 09:33:59
|
If lz77.enabled, the decoder sets lz_cfg = num_dist++, and reads lz_len_conf = HybridUintConfig(8)
|
|
2022-11-06 09:36:38
|
so basically one extra pre-clustered distribution is added for the lz77 distances
|
|
|
Traneptora
|
2022-11-06 09:37:54
|
the decoder reads a HybridUintConfig for lz77, which is separate from the HybridUintConfig added for the extra distribution
|
|
2022-11-06 09:38:49
|
However, `lz_len_conf` is the variable name used in section C.1. if It later calls it `lz_cfg`, that's a typo.
|
|
|
_wb_
|
2022-11-06 09:38:55
|
when reading a symbol, first a token is read which can be either a literal or represent an lz77 length, and if it's an lz77 length, then another token is read using a fixed (but signaled) hybriduintconfig to get the distance
|
|
|
Traneptora
|
|
_wb_
|
2022-11-06 09:39:19
|
I think the spec is correct but the variable names and way it is described is not very clear
|
|
|
Traneptora
|
2022-11-06 09:39:34
|
`lz_cfg` is the index of the lz_len_conf, I see.
|
|
|
_wb_
|
2022-11-06 09:39:50
|
lz_len_conf is a hybriduintconfig used to interpret lengths
|
|
2022-11-06 09:40:36
|
lz_cfg is a context used to read distances
|
|
|
Traneptora
|
2022-11-06 09:40:56
|
It might make sense to call it `lz_distance_ctx`
|
|
2022-11-06 09:41:12
|
just then it's really clear that it's a context ID
|
|
|
_wb_
|
2022-11-06 09:41:17
|
lz_len_conf and lz_distance_ctx would be good names, I agree
|
|
2022-11-06 09:42:14
|
the confusing thing is the lengths don't need a context because the token is just a regular token that happens to be larger than some offset
|
|
|
Traneptora
|
2022-11-06 09:42:37
|
oh while we're on the topic of section C.3.3.:
```c++
else if (distance < 120) {
distance = kSpecialDistances[distance][0];
distance += dist_multiplier * kSpecialDistances[distance][1];
```
this doesn't work
you probably mean something like
```c++
else if (distance < 120) {
offset = kSpecialDistances[distance][0];
distance = offset + dist_multiplier * kSpecialDistances[distance][1];
```
|
|
2022-11-06 09:42:53
|
in particular, consider what happens when the offset is negative
|
|
2022-11-06 09:42:56
|
the first one crashes the code
|
|
2022-11-06 09:43:07
|
libjxl handles it correctly by doing it in one expression, but the spec is incorrect
|
|
|
_wb_
|
2022-11-06 09:43:45
|
right, d'oh
|
|
2022-11-06 09:44:06
|
so easy to write silly bugs in code that doesn't get executed
|
|
|
Traneptora
|
2022-11-06 09:44:26
|
Yea, that's the main difference between proofreading code and trying to implement it
|
|
|
_wb_
|
2022-11-06 09:45:01
|
and at ISO they only did the former, if even that
|
|
2022-11-06 09:45:59
|
many of the comments we got from national bodies were more about the dots and commas in boilerplate text than about the actual technical meat
|
|
|
Traneptora
|
2022-11-06 09:46:54
|
one thing I find convenient about doing this in java is attempting to index an array with a negative number immediately throws an exception, which crashes the process. the JVM enforces those memory barriers. it means those sorts of off-by-one errors fail-fast
|
|
2022-11-06 09:47:22
|
unlike in C where you need valgrind or a similar tool to tell you, unless x[-1] happens to hit a dead page
|
|
|
andrew
|
|
_wb_
lz_cfg is a context used to read distances
|
|
2022-11-06 09:47:25
|
so `lz_cfg` (which should probably be named `lz_distance_ctx`) is a field in the entropy decoder state that refers to some index in `clusters`?
|
|
|
Traneptora
|
2022-11-06 09:48:55
|
when Lz77 is enabled, it reads a token. if the token is too big, it goes "oh, this must be an Lz77 length"
|
|
2022-11-06 09:49:03
|
but Lz77 uses length/distance *pairs*
|
|
2022-11-06 09:49:17
|
so it then reads the distance, using `lz_distance_ctx` as the context.
|
|
|
andrew
so `lz_cfg` (which should probably be named `lz_distance_ctx`) is a field in the entropy decoder state that refers to some index in `clusters`?
|
|
2022-11-06 09:50:36
|
also keep in mind that `clusters` is a map. it maps contexts to clusters. since in practice some of the contexts will have very similar frequencies, clustering lets the stream consider two contexts to be the "same"
|
|
2022-11-06 09:50:48
|
so it maps a context ID to a cluster ID
|
|
2022-11-06 09:51:04
|
which can and frequently will have a lot of overlap
|
|
2022-11-06 09:52:07
|
it's easy to treat it as an array, but conceptually it's a function
`int mapContextIDToClusterID(int contextID) // returns cluster ID`
|
|
|
_wb_
|
2022-11-06 09:54:55
|
context clustering was invented in brunsli/pik (maybe already in brotli / lossless webp?) as a way to have relatively large context models while keeping the number of histograms to signal manageable
|
|
2022-11-06 09:56:01
|
MA trees were invented in FLIF as a way to have even larger context models (ones where even the context map itself would be too large) while keeping the number of histograms (or in case of FLIF, CABAC chance tables) manageable
|
|
2022-11-06 09:56:48
|
and in JXL's modular mode you get both, which basically boils down to the MA tree effectively being a DAG instead of a tree
|
|
2022-11-06 09:57:15
|
(i.e. you can merge two leaf nodes however distant they are)
|
|
|
Traneptora
|
2022-11-06 09:57:22
|
in jxlatte I save some effort by flattening static properties but less so than in libjxl. unfortunately jxlatte is pretty slow
|
|
2022-11-06 09:58:55
|
the way I have it implemented in jxlatte is basically I have an MATree that contains a pointer to two child trees, also of type MATree. and I walk down the tree by doing basically `while (!tree.isLeaf()) tree = tree.decide();`
|
|
|
_wb_
|
2022-11-06 09:58:58
|
i recommend starting with slow and correct and worrying about speed later, because some things really get quite a lot more complicated and bug-prone if you want to do them fast too
|
|
2022-11-06 09:59:42
|
yeah the naive branchy MA tree evaluation is good enough, it works, it's just slow
|
|
|
Traneptora
|
2022-11-06 10:00:01
|
luckily this method makes walking down static properties pretty easy
since I have a function that just walks down the tree until it hits a non-static property then aborts and returns the resulting tree
|
|
2022-11-06 10:00:28
|
so I can save a bunch of cycles that way
|
|
2022-11-06 10:00:49
|
I also compute properties on the fly, which can be slower if they need to be used again
|
|
2022-11-06 10:01:03
|
but can be faster if they don't need to be used
|
|
|
_wb_
|
2022-11-06 10:02:16
|
eventually we might want to do property computation and MA tree walking with some kind of dynamic code generation, but that's scary and tricky so for now we just stick to some specializations and some speedup tricks like turning binary trees into quadtrees
|
|
|
Traneptora
|
2022-11-06 10:03:17
|
but there's a few optimizations I could do, like I could make the row-arrays slightly bigger than they need to be, and populate them with the edge-case predictors
that way there's no branching required when doing west() or north()
|
|
2022-11-06 10:04:04
|
since doing `y > 0 ? buffer[y - 1][x] : x > 0 ? buffer[y][x -1] : 0`
|
|
2022-11-06 10:04:05
|
ugh
|
|
2022-11-06 10:04:08
|
so much branching
|
|
2022-11-06 10:04:31
|
this can be optimized as a CMOV by a compiler but on the JVM I'm pretty sure it's a branch
|
|
2022-11-06 10:04:35
|
not entirely sure tho
|
|
2022-11-06 10:04:50
|
I'm targeted the JVM
|
|
|
_wb_
|
2022-11-06 10:04:53
|
yes, all of that helps, and you can do whatever is easy enough to do, but I think at this point having a full independent decoder that works though slowly is more valuable than having a partial one that is fast
|
|
|
Traneptora
|
2022-11-06 10:05:07
|
yea, this is all after I get full feature support
|
|
2022-11-06 10:05:09
|
it's lower priority
|
|
2022-11-06 10:05:26
|
the contest submission art you posted is going to be good for testing splines and noise tho, as well as frame blending
|
|
2022-11-06 10:05:35
|
so I'm happy about that
|
|
|
_wb_
|
2022-11-06 10:06:18
|
yeah it was an art in the spirit of what jxl_from_tree was originally made for: producing test bitstreams ๐
|
|
|
Traneptora
|
2022-11-06 10:07:05
|
pretty on-point since a lot of my debug test samples came from <#824000991891554375>
|
|
|
_wb_
|
2022-11-06 10:18:43
|
I like it when fun and art can be at the same time a useful engineering tool used in something as 'boring' as spec verification
|
|
|
improver
|
2022-11-06 10:29:21
|
human-assisted fuzzing
|
|
|
andrew
|
2022-11-06 10:35:52
|
in C.2.1/C.2.2, where does `num_clusters` come from if `num_dist != 1`?
|
|
2022-11-06 10:36:45
|
https://discord.com/channels/794206087879852103/794206170445119489/1011659525385883788
|
|
2022-11-06 10:37:03
|
oh, it's supposed to be calculated by the max of clusters[i] + 1
|
|
2022-11-06 10:37:14
|
this should probably be specified
|
|
|
Traneptora
|
|
andrew
this should probably be specified
|
|
2022-11-06 11:21:47
|
It is in sectino C.2.2
|
|
2022-11-06 11:21:50
|
> The output of this procedure is an array clusters with num_dist entries (one for each pre-clustered context), with values in the range [0, num_clusters). All integers in [0, num_clusters) are present in this array, and num_clusters < 256. Position i in this array indicates that context i is merged into the corresponding cluster.
|
|
2022-11-06 11:22:24
|
since values arei in the range of `[0, num_clusters)` it implies that it occupies all values x for which 0 <= x < num_clusters
|
|
2022-11-06 11:23:17
|
which then tells you that num_clusters is the smallest integer greater than all of the xs
|
|
2022-11-06 11:23:32
|
It could be more explicit, but it it technically *is* specified
|
|
2022-11-07 12:24:50
|
if channel data ranges from `[0.0f, 1.0f]`
|
|
2022-11-07 12:24:56
|
```
R = Y + 0.5 + 1.402 * Cr;
G = Y + 0.5 โ 0.344136 * Cb โ 0.714136 * Cr;
B = Y + 0.5 + 1.772 * Cb;
```
|
|
2022-11-07 12:25:06
|
how does R or B ever end up as less than 0.5?
|
|
2022-11-07 12:25:22
|
if Cr = Cb = 0 you end up with Y + 0.5, not Y
|
|
2022-11-07 12:29:46
|
or is this one of those scenarios where Cb and Cr range from `[-0.5, 0.5]`?
|
|
2022-11-07 12:30:32
|
same with Y?
|
|
2022-11-07 12:30:47
|
because I'd assume that grayscale would be R=G=B=Y
|
|
|
_wb_
|
2022-11-07 01:47:29
|
Cr and Cb get negative, yes
|
|
2022-11-07 01:47:51
|
And Y too, yes
|
|
2022-11-07 01:48:17
|
Y=0 is gray
|
|
2022-11-07 01:48:59
|
I think
|
|
|
Traneptora
|
2022-11-07 02:43:01
|
Y=0 being 50% gray means these all range from [-0.5, 0.5]
|
|
2022-11-07 02:43:09
|
which is fine, I just wanted to double check
|
|
2022-11-07 03:29:20
|
Section K.4.1 says this:
|
|
2022-11-07 03:29:39
|
> Then, UnpackSigned(DecodeHybridVarLenUint(5)) is used four times thirty-two times to obtain, in this order, sequences of thirty-two integers representing the following coefficients of the spline along its arc length:
> q_dctX[32], the decorrelated and quantized DCT32 coefficients of the X channel of the spline
>
> q_dctY[32], the quantized DCT32 coefficients of the Y channel of the spline
>
> q_dctB[32], the decorrelated and quantized DCT32 coefficients of the B channel of the spline
>
> q_dctSigma[32], the quantized DCT32 coefficients of the ฯ
> parameter of the spline (defining its thickness)
|
|
2022-11-07 03:30:26
|
are all 32 of the `q_dctX[32]` read first, and then all 32 of the `q_dctY[32]`, and then all 32... etc.
|
|
2022-11-07 03:30:33
|
or are they interleaved?
|
|
2022-11-07 03:30:38
|
It's not clear to me from how this is phrased.
|
|
2022-11-07 03:34:29
|
Another question
|
|
2022-11-07 03:34:31
|
> These coefficients are dequantized and recorrelated (see I.2.3) as follows:
|
|
2022-11-07 03:34:58
|
Splines can be used in Modular Mode (e.g. lossy XYB), but the LFChannelCorrelation bundle (defined in I.2.3) is only present in LFGlobal in VarDCT mode.
|
|
2022-11-07 03:35:05
|
Do you use the default value in this case?
|
|
|
_wb_
|
|
|
veluca
|
2022-11-08 01:05:29
|
pretty sure cb and cr are in +-.5 range, and y is in 0-1 range...
|
|
|
Traneptora
|
2022-11-08 01:08:15
|
if Y is `[0, 1]` then the inverse YCbCr transform appears to be incorrect
|
|
|
|
veluca
|
2022-11-08 01:34:07
|
let me check in libjxl
|
|
2022-11-08 01:35:50
|
ok, no, y is also +-.5
|
|
2022-11-08 01:36:09
|
I am not quite sure why we did that as XYB's Y isn't, but oh well
|
|
|
_wb_
|
2022-11-08 05:32:17
|
Because that is how jpeg does it
|
|
2022-11-08 05:32:49
|
I think
|
|
|
Traneptora
|
2022-11-08 07:23:29
|
Section K.4.1 provides pseudocode but it decodes quant_adjust in the wrong order
|
|
2022-11-08 07:23:50
|
this is in section K.4.1:
```c
num_splines = DecodeHybridVarLenUint(2) + 1;
quant_adjust = UnpackSigned(DecodeHybridVarLenUint(0));
last_x = 0; last_y = 0;
for (i = 0; i < num_splines; i++) {
// etc.
```
|
|
2022-11-08 07:24:26
|
splines.cc line 549:
|
|
2022-11-08 07:24:31
|
```cpp
JXL_RETURN_IF_ERROR(DecodeAllStartingPoints(&starting_points_, br, &decoder,
context_map, num_splines));
quantization_adjustment_ = UnpackSigned(
decoder.ReadHybridUint(kQuantizationAdjustmentContext, br, context_map));
```
|
|
2022-11-08 07:24:43
|
notice that `quantization_adjustment_` is decoded after reading the starting points, not before, unlike what it says in the spec.
|
|
|
_wb_
|
2022-11-08 08:57:58
|
> Suggestion for a spec extension / bit of a long shot: Allow horizontal and vertical flips of patches. The point is that you've already done the work by defining mirrored coordinates - so keep the source coordinates of the patch normal but allow the destination coordinates to be mirrored. It's not really a bitstream change...
(suggestion from <@967519502977888367> )
|
|
2022-11-08 08:59:45
|
We did consider adding bits for flips and it could be useful for symmetric things like many letter shapes or circles/ellipses, but we ended up not doing it.
|
|
2022-11-08 09:02:15
|
Doing it via negative / out of bounds offsets and using mirroring semantics could work with just a small spec change, but it would make it impossible to use patches that cross the image edge, like e.g. a sprite that is partially off-screen (since the off-screen part would end up getting mirrored and messing up the on-screen part)
|
|
2022-11-08 09:03:55
|
(though such patches could still be done by doing them separately with smaller source rectangles, which would be a small extra signaling cost but not hugely)
|
|
2022-11-08 09:06:48
|
The implementation of patch blitting would be a bit trickier though if it needs to do mirroring โ naive per-pixel coordinate mirroring would be easy to implement but to do it efficiently there would be one code path for each of the 4 possible orientations, and the case of image edge crossing would need to be handled too (either by not allowing it or by splitting the blitting in 2 or 4 operations)
|
|
|
|
dds
|
2022-11-08 09:35:45
|
I'm glad it was considered ๐ Though pixelwise patch blitting is surely pretty quick compared to other steps of the image decode..?
|
|
2022-11-08 09:36:08
|
I wonder how practical 'fake' mirroring is with the current spec. Say you want to do a vertical mirror. Then take one n x 1 patch per row and assemble them in reverse order. Finally hope that the large amount of patch metadata compresses well.
|
|
|
_wb_
Doing it via negative / out of bounds offsets and using mirroring semantics could work with just a small spec change, but it would make it impossible to use patches that cross the image edge, like e.g. a sprite that is partially off-screen (since the off-screen part would end up getting mirrored and messing up the on-screen part)
|
|
2022-11-08 09:42:42
|
Not that it really matters, I think I stated it the wrong way round from the way I first thought of it: instead allow the source patch to be mirrored but insist that the destination is completely in-bounds. Then crossing the image edge isn't an issue - you get a partially mirrored patch rather than corruption.
|
|
|
Traneptora
|
2022-11-08 09:44:56
|
as far as I'm aware, the design of extensions though is based on PNG, in that a decoder is required to ignore extensions it doesn't understand
|
|
2022-11-08 09:45:36
|
which makes implementing extensions that change rendering behavior a bit iffy
|
|
|
_wb_
|
2022-11-08 09:45:55
|
Right, and then if the source region overlaps the edge, it could actually make sense, e.g. you could put the left half of an A at the right end of the patch frame, take a source that goes over the right edge, and that would blit as a full A
|
|
|
Traneptora
which makes implementing extensions that change rendering behavior a bit iffy
|
|
2022-11-08 09:48:27
|
yeah this would not be done as an extension but by defining behavior for bitstreams where the current behavior is not really clear: what if the source rect for a patch is outside the source frame?
|
|
2022-11-08 09:49:02
|
currently the spec says "All referenced samples are within the bounds of the reference frame." so this is an invalid bitstream atm
|
|
|
Traneptora
|
2022-11-08 09:49:30
|
oh yea, if that happens I return an error and refuse to decode
|
|
|
_wb_
|
2022-11-08 09:49:50
|
we could make it a valid bitstream; existing decoders will not decode it of course or at least not correctly, so it's probably a bad idea
|
|
|
Traneptora
|
2022-11-08 09:50:17
|
what I discovered is that for building a decoder, it's helpful to aggressively throw errors at non-compliance
|
|
2022-11-08 09:50:25
|
because a lot of those errors end up being caused by bugs
|
|
2022-11-08 09:50:29
|
rather than by actual noncompliance
and they get caught
|
|
|
lonjil
|
2022-11-08 09:50:34
|
I'll implement this, creating the first fragmentation of the JXL ecosystem ๐
|
|
|
_wb_
|
2022-11-08 09:50:42
|
lol
|
|
2022-11-08 09:50:59
|
well decoders are free to do what they want on non-compliant bitstreams
|
|
2022-11-08 09:51:26
|
the real problem is when encoders start to rely on what decoders do with them
|
|
|
Traneptora
|
2022-11-08 09:51:31
|
but I mean, throwing errors rather than making best-effort decodes is easier for building a decoder
|
|