|
_wb_
|
2022-05-19 08:45:20
|
So gcc bug?
|
|
|
|
veluca
|
2022-05-19 08:45:23
|
you could perhaps try compiling with `-mpreferred-stack-boundary=5` ?
|
|
|
Traneptora
|
2022-05-19 08:48:23
|
trying with ` -mpreferred-stack-boundary=5 -fno-ipa-stack-alignment`
|
|
2022-05-19 08:49:30
|
what about `-mstackrealign`
|
|
|
|
veluca
|
2022-05-19 08:51:13
|
worth trying
|
|
|
Traneptora
|
2022-05-19 08:52:03
|
```cc1: error: '-mpreferred-stack-boundary=5' is not between 3 and 4```
|
|
2022-05-19 08:52:06
|
<:Madge:859968580996956200>
|
|
|
|
veluca
|
2022-05-19 08:52:17
|
eh
|
|
2022-05-19 08:52:28
|
well then.
|
|
2022-05-19 08:52:34
|
what gcc is that?
|
|
|
Traneptora
|
2022-05-19 08:55:30
|
x86_64-w64-mingw32-g++ 11.2.0
|
|
2022-05-19 08:56:06
|
... oh?
|
|
2022-05-19 08:56:08
|
` -mavx256-split-unaligned-load -mavx256-split-unaligned-store`
|
|
2022-05-19 08:56:33
|
```
-mavx256-split-unaligned-load
-mavx256-split-unaligned-store
Split 32-byte AVX unaligned load and store.
```
|
|
2022-05-19 08:56:36
|
definitely wroth testing.
|
|
2022-05-19 08:57:22
|
If this does what I think it does, it splits load and store into two loads and stores if it's unaligned
|
|
|
|
veluca
|
2022-05-19 09:02:20
|
maybe worth a try... but I doubt it
|
|
2022-05-19 09:02:29
|
the compiler thinks it's aligned after all
|
|
|
Traneptora
|
2022-05-19 09:10:05
|
it still crashes in wine, yea
|
|
2022-05-19 09:15:26
|
is there a way to force 32-byte alignment inside highway?
|
|
|
|
veluca
|
2022-05-19 09:16:51
|
alignment *for what*?
|
|
2022-05-19 09:16:59
|
for sure not for compiler-generated spills...
|
|
2022-05-19 09:17:19
|
this is pretty much a mingw / gcc bug, I'd say
|
|
|
Traneptora
|
2022-05-19 09:17:58
|
can we work around it?
|
|
|
veluca
this is pretty much a mingw / gcc bug, I'd say
|
|
2022-05-19 09:20:30
|
https://github.com/msys2/MSYS2-packages/issues/1209#issuecomment-379547589
|
|
2022-05-19 09:20:34
|
known, apparently
|
|
|
|
veluca
|
2022-05-19 09:23:19
|
well, it's at least not a msys2 problem
|
|
2022-05-19 09:24:00
|
and I think the issue ppl found was something else, since it's not from code built with gcc at all
|
|
|
Traneptora
|
|
veluca
ok, then can you do 2 things...
(a) print the pointer ๐
(b) switch to impl 2
|
|
2022-05-19 09:59:21
|
would it be better to define `HWY_LOADUP_ASM` or to switch to impl2
|
|
|
|
veluca
|
2022-05-19 10:00:20
|
mhhh given that your code gets miscompiled in entirely new and exciting ways I'm not sure it would be incredibly useful to do either...
|
|
|
Traneptora
|
2022-05-19 10:00:36
|
what does defining `HWY_LOADUP_ASM` do?
|
|
|
|
veluca
|
2022-05-19 11:04:15
|
it uses assembly directly (the inline asm thingy there)
|
|
2022-05-19 11:04:29
|
why does that exist? unclear, I couldn't find anything that defines that flag to true
|
|
|
Traneptora
|
2022-05-20 12:05:35
|
defining it manually didn't seem to fix anything
|
|
|
|
veluca
|
2022-05-20 12:09:31
|
you need to define it to 1... but yeah, I doubt it'll change much
|
|
|
Traneptora
|
2022-05-20 12:09:37
|
that's what I meant
|
|
2022-05-20 12:09:52
|
`-DHWY_LOADUP_ASM=1`
|
|
|
|
veluca
|
2022-05-20 12:09:56
|
you'd have to compile with a different compiler, or at least one that isn't as broken as mingw-gcc
|
|
|
Traneptora
|
2022-05-20 12:10:49
|
in unrelated news, what's the difference between lcms2 and skcms?
|
|
|
|
veluca
|
2022-05-20 12:10:51
|
(_mm256_broadcast_ps compiles to vbroadcastf128 anyway)
|
|
|
Traneptora
|
2022-05-20 12:10:57
|
practical difference I mean
|
|
2022-05-20 12:11:09
|
why use one or the other
|
|
|
|
veluca
|
|
Traneptora
in unrelated news, what's the difference between lcms2 and skcms?
|
|
2022-05-20 12:11:09
|
as far as I understand, lcms2 is slower but supports more features
|
|
|
Traneptora
|
2022-05-20 12:11:14
|
ah
|
|
2022-05-20 12:11:42
|
in that case, is there a way to use skcms if it supports what we want it to do, but lcms2 if skcms does not?
|
|
|
|
veluca
|
2022-05-20 12:11:56
|
I think skcms doesn't support converting *to* CMYK or to LUT-based profiles, and probably doesn't support some of the more uncommon rendering intents
|
|
|
Traneptora
in that case, is there a way to use skcms if it supports what we want it to do, but lcms2 if skcms does not?
|
|
2022-05-20 12:12:26
|
we actually discussed this today xD not yet, but we're thinking about it
|
|
|
|
Hello71
|
2022-05-23 05:45:52
|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c40 says it can be worked around with -muse-unaligned-vector-move
|
|
2022-05-23 05:46:01
|
which is an absolutely terrible hack
|
|
2022-05-23 05:47:02
|
note that this would be CXXFLAGS+=' -Wa,-muse-unaligned-vector-move'
|
|
|
|
veluca
|
2022-05-23 06:11:52
|
Reported: 2012-08-30 01:10 UTC by R Copley
that's not very reassuring
|
|
|
Traneptora
|
2022-05-24 04:59:44
|
It does what you want though, it produces `vmoveups` instead of `vmoveaps` and similar
|
|
|
|
veluca
|
2022-05-24 06:28:49
|
well yes, but *always*
|
|
|
|
Hello71
|
2022-05-24 06:40:28
|
is that bad
|
|
|
Traneptora
|
2022-05-24 06:41:55
|
I don't believe there's a major performance difference between these two instructions tbh
|
|
|
|
veluca
|
2022-05-24 06:43:18
|
that's true on new enough CPUs
|
|
|
|
Hello71
|
2022-05-24 06:44:15
|
well it's `vmovups` and `vmovaps`, not `vmoveups` and `vmoveaps`, but setting that aside, https://stackoverflow.com/questions/20259694/is-the-sse-unaligned-load-intrinsic-any-slower-than-the-aligned-load-intrinsic-o claims that they're about the same, and https://stackoverflow.com/questions/52147378/choice-between-aligned-vs-unaligned-x86-simd-instructions claims that "On pre-Nehalem and Bonnell and pre-Bulldozer: [...] movups/vmovups can be up to twice as slow as movaps/vmovaps"
|
|
|
Traneptora
|
|
Hello71
well it's `vmovups` and `vmovaps`, not `vmoveups` and `vmoveaps`, but setting that aside, https://stackoverflow.com/questions/20259694/is-the-sse-unaligned-load-intrinsic-any-slower-than-the-aligned-load-intrinsic-o claims that they're about the same, and https://stackoverflow.com/questions/52147378/choice-between-aligned-vs-unaligned-x86-simd-instructions claims that "On pre-Nehalem and Bonnell and pre-Bulldozer: [...] movups/vmovups can be up to twice as slow as movaps/vmovaps"
|
|
2022-05-24 06:45:08
|
none of those support avx2 so I don't believe it will matter
|
|
|
|
Hello71
|
2022-05-24 06:45:27
|
yes, i don't understand why they wrote movups/vmovups
|
|
2022-05-24 06:45:36
|
it should only apply to movups vs movaps
|
|
|
Traneptora
|
2022-05-24 06:46:19
|
and `-muse-unaligned-vector-move` should only affect vmov[au]ps, not the nonvector version
|
|
|
|
veluca
|
2022-05-24 06:48:21
|
iirc v is for vex-encoded, not for vector
|
|
|
Traneptora
|
2022-05-24 06:48:26
|
oh, nvm
|
|
|
|
Hello71
|
2022-05-24 07:08:16
|
i thought v was for aVx ๐คท
|
|
|
|
veluca
|
2022-05-24 07:18:14
|
it might
|
|
2022-05-24 07:18:41
|
IIRC the instruction encoding prefix is called VEX for AVX and EVEX for AVX512+
|
|
|
Traneptora
|
2022-05-24 07:49:17
|
VEX is Vector Extensions iirc
|
|
2022-05-26 02:49:42
|
how hard would it be to implement the code path in https://github.com/libjxl/libjxl/pull/1366
|
|
2022-05-26 02:49:49
|
because it seems like it's going nowhere
|
|
2022-05-26 02:49:58
|
and I'm interested in possibly taking a look at it
|
|
|
_wb_
|
2022-05-26 04:06:02
|
Probably the easiest way to do it would be to call the cms in the final writing stage of the render pipeline, just after interleaving RGB but before converting it to integers.
|
|
2022-05-26 04:06:43
|
Only tricky bit will be to get gray and cmyk right too, but can start with rgb...
|
|
|
Traneptora
|
|
_wb_
Probably the easiest way to do it would be to call the cms in the final writing stage of the render pipeline, just after interleaving RGB but before converting it to integers.
|
|
2022-05-26 05:29:41
|
does the CMS take packed and not planar RGB?
|
|
2022-05-26 05:31:49
|
oh right, ugh, the project is in C++
|
|
2022-05-26 05:31:53
|
and I don't know C++, I only know C
|
|
2022-05-26 05:32:14
|
I'll be taking a class this summer on C++ so I guess I'll look at it after that
|
|
|
_wb_
|
2022-05-26 06:06:31
|
CMS takes an interleaved row
|
|
|
|
veluca
|
2022-05-26 11:54:03
|
one of the harder parts (I think) will be figuring out when you don't *need* the CMS
|
|
2022-05-26 11:54:41
|
and also developing this in a somewhat long-term maintanable way
|
|
|
fab
|
2022-05-27 12:46:24
|
https://it.wikipedia.org/wiki/JPEG_XL
|
|
|
Traneptora
|
|
Hello71
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54412#c40 says it can be worked around with -muse-unaligned-vector-move
|
|
2022-05-27 06:16:56
|
I just tested this and it works
|
|
2022-05-27 06:17:01
|
i.e. it prevents the crash
|
|
|
|
veluca
|
2022-05-27 06:51:00
|
good
|
|
2022-05-27 06:51:09
|
ideally though gcc would fix their bugs ๐
|
|
|
Traneptora
|
2022-05-27 07:22:43
|
yea ikr
|
|
|
BlueSwordM
|
2022-05-27 09:50:14
|
I think there is another way to fix this.
|
|
2022-05-27 09:50:22
|
Use mingw based on Clang instead of GCC.
|
|
|
Kleis Auke
|
|
BlueSwordM
Use mingw based on Clang instead of GCC.
|
|
2022-05-28 12:35:24
|
I also recommend this. In case anyone is looking for a way to do this with Docker:
https://discord.com/channels/794206087879852103/848189884614705192/934838441647173812
|
|
|
fab
|
2022-05-28 01:58:20
|
avif is ugly
|
|
2022-05-28 01:58:20
|
|
|
2022-05-28 01:58:42
|
86 kb image
|
|
2022-05-28 01:59:57
|
you can't really say it is at jxl level
|
|
|
yurume
|
2022-05-28 02:00:36
|
what on the earth is it
|
|
|
fab
|
2022-05-28 02:01:14
|
|
|
2022-05-28 02:01:48
|
now is woking
|
|
|
yurume
|
2022-05-28 02:01:58
|
is it corrupted? looks fine to me (via my Firefox)
|
|
|
fab
|
2022-05-28 02:02:11
|
is a windows seven error
|
|
2022-05-28 02:02:28
|
or because there is jpg extension before avif
|
|
2022-05-28 02:02:49
|
why mozilla on pages that uses avif on my windows seven shows unusable images
|
|
2022-05-28 02:02:51
|
always
|
|
2022-05-28 02:02:54
|
and always
|
|
2022-05-28 02:03:59
|
source
|
|
2022-05-28 02:03:59
|
|
|
|
Traneptora
|
|
fab
|
|
2022-05-28 02:26:28
|
win7 shows the dimensions as incorrect too
|
|
2022-05-28 02:26:41
|
since it's vertical
|
|
|
fab
|
2022-05-28 02:27:12
|
the esolution was high
|
|
|
Traneptora
|
2022-05-28 02:27:25
|
oh nvm there's a scrollbar there
|
|
2022-05-28 02:27:34
|
but yea that looks like a decoder error to me
|
|
|
Yari-nyan
|
|
fab
|
|
2022-05-29 05:53:04
|
that's kind of amazing
|
|
|
Traneptora
|
2022-05-31 07:10:56
|
does libjxl populate `JxlColorEncoding->white_point_xy` even when the enum is not set to `JXL_WHITE_POINT_CUSTOM`? for the decoder API, I mean.
|
|
2022-05-31 07:14:46
|
looks like it, in `void ConvertInternalToExternalColorEncoding` but this is not documented
|
|
2022-05-31 08:13:18
|
also, another question
|
|
2022-05-31 08:24:11
|
do all Jpeg Reconstructions have `info.uses_original_profile` set to true?
|
|
|
Traneptora
does libjxl populate `JxlColorEncoding->white_point_xy` even when the enum is not set to `JXL_WHITE_POINT_CUSTOM`? for the decoder API, I mean.
|
|
2022-05-31 08:24:36
|
also same with the corresponding primaries
|
|
|
|
veluca
|
|
Traneptora
do all Jpeg Reconstructions have `info.uses_original_profile` set to true?
|
|
2022-05-31 08:25:11
|
I think so
|
|
|
_wb_
|
|
Traneptora
do all Jpeg Reconstructions have `info.uses_original_profile` set to true?
|
|
2022-05-31 08:25:36
|
Yes, because XYB jpegs are not a thing in the wild
|
|
|
Traneptora
|
2022-05-31 08:26:17
|
I'm working on my wrapper in FFmpeg to make colors more correct, which is why I ask about `white_point_xy` btw
|
|
2022-05-31 08:27:14
|
rather than have a big switch statement that maps combinations of enums to the corresponding FFmpeg enums, I figure it's simpler to take the actual values and feed it to a function in `libavutil/csp.h` which outputs FFmpeg's enum based on the values
|
|
2022-05-31 08:30:38
|
but I was wondering if those values were correct even if libjxl uses enum values
|
|
2022-05-31 08:31:15
|
it appears that is how it currently works but it's unclear to me that it will always stay that way, given the documentation doesn't specify it
|
|
2022-05-31 08:31:21
|
should I send a PR that documents this more explicitly?
|
|
|
yurume
|
2022-05-31 08:35:01
|
is the `libjxl/conformance` repo meant to be a part of ISO/IEC 18181-3 (conformance testing)? or will be there a separate conformance test suite?
|
|
|
_wb_
|
|
Traneptora
should I send a PR that documents this more explicitly?
|
|
2022-05-31 08:38:32
|
Yes, I think this is useful behavior and we should document it and keep doing it :)
|
|
|
yurume
is the `libjxl/conformance` repo meant to be a part of ISO/IEC 18181-3 (conformance testing)? or will be there a separate conformance test suite?
|
|
2022-05-31 08:39:12
|
Yes, basically part 3 is just a snapshot of that repo with some boilerplate text
|
|
|
yurume
|
2022-05-31 08:40:23
|
the repo looked like a handful number of very high-level integration tests right now
|
|
|
_wb_
|
2022-05-31 08:40:25
|
And we should probably add way more conformance bitstreams but this seemed to be a reasonable start, if a decoder can handle all that stuff, it probably handles most things in the spec, at least
|
|
|
yurume
|
2022-05-31 08:41:08
|
I'm starting to think about making some sort of test suite, at least for jxsml
|
|
|
_wb_
|
2022-05-31 08:41:15
|
That would be great
|
|
2022-05-31 08:42:24
|
Feel free to add stuff to the conformance repo too, we can't really have too many tests there
|
|
|
yurume
|
2022-05-31 08:43:48
|
I'm not sure how to design the test suite, especially what to compare against, that's why I'm holding that idea right now, but that's something I love to have
|
|
|
_wb_
|
2022-05-31 08:43:58
|
Especially some set of smaller tests that are just a small bitstream encoding a small image that checks one or a small number of coding tools
|
|
|
yurume
|
2022-05-31 08:44:22
|
yeah, that kind of things
|
|
2022-05-31 08:44:44
|
right now that information is deeply embedded in libjxl unit tests and can't be easily extracted
|
|
|
_wb_
|
2022-05-31 08:45:54
|
Well currently we just used libjxl to produce the reference decodes, but of course if we hit a case where your decoder fails something, we'll have to check who is correct: libjxl, jxsml, and/or the spec :)
|
|
|
Traneptora
|
2022-05-31 09:08:07
|
did spot ever get fixed?
|
|
2022-05-31 09:08:14
|
it's a level 10 codestream by itself without a container
|
|
2022-06-03 06:42:32
|
Is there an updated draft of the spec?
|
|
2022-06-03 06:42:39
|
with the bugs that yurume found fixed I mean
|
|
|
yurume
|
2022-06-03 06:43:48
|
I think wb did a round of fixups weeks ago but I still keep finding bugs as I implement it further so there is not exactly a fully updated spec with no known bugs
|
|
|
Traneptora
|
2022-06-03 06:44:33
|
I'm planning on writing a rudimentary ANS decoder
|
|
2022-06-03 06:44:42
|
this is why I ask
|
|
2022-06-03 06:45:03
|
but I don't want to start on it if there's lots of bugs
|
|
|
yurume
|
2022-06-03 06:46:17
|
so essentially the full entropy decoder? (which would involve Brotli huffman trees and stuffs)
|
|
2022-06-03 06:47:16
|
maybe I can paraphrase it in the way I understand (and hopefully correctly) to guide other implementations
|
|
2022-06-03 06:48:23
|
I think that part of the spec has been sorted out in the past round anyway, so an updated spec might be still enough for you (I'll DM you soon)
|
|
|
Arcane
|
|
yurume
|
2022-06-03 06:48:50
|
(click miss...)
|
|
|
Traneptora
|
|
yurume
so essentially the full entropy decoder? (which would involve Brotli huffman trees and stuffs)
|
|
2022-06-03 06:49:26
|
just ANS. for example, the spec draft says this:
|
|
2022-06-03 06:49:41
|
> the decoder reads permutation (F.3.2) from a single entropy coded stream with 8 clustered distributions, as specified in C.2, with size equal to the number of TOC entries and skip = 0.
|
|
2022-06-03 06:50:01
|
where Section C.2 is ANS and Prefix Coding
|
|
|
yurume
|
2022-06-03 06:50:25
|
ah, that still requires you to implement everything in Section C
|
|
2022-06-03 06:52:29
|
the entropy coding is either done by ANS or Huffman, both requiring you to decode the frequency distribution and cluster/context map, and that decoding uses another level of compression which can be also done by either ANS or Huffman
|
|
2022-06-03 06:53:24
|
so even though the final code itself is ANS, the metadata needed for constructing that code might use Huffman and vice versa
|
|
|
Traneptora
|
2022-06-03 06:56:42
|
does that mean you might need Brotli too?
|
|
|
yurume
|
2022-06-03 06:57:03
|
the tree encoding of Brotli, not the entirety of it, but yeah
|
|
|
Traneptora
|
2022-06-03 06:57:11
|
yuck, so that's a lot to implement
|
|
|
yurume
|
2022-06-03 06:57:17
|
indeed
|
|
2022-06-03 06:57:50
|
I had to implement it when I finished the image header and then have to go through an (compressed) ICC profile
|
|
|
Traneptora
|
2022-06-03 06:58:14
|
do you know of any shortcut if I only want to determine the length, rather than the contents?
|
|
|
yurume
|
2022-06-03 06:58:16
|
it turned out that implementing entropy codes that early was a good decision
|
|
2022-06-03 06:58:35
|
which length do you mean?
|
|
|
Traneptora
|
2022-06-03 06:58:43
|
the length of the encoded stream, so I can skip over it
|
|
2022-06-03 06:59:01
|
for example, if I want to skip to the FrameHeader of a JXL codestream that contains an ICC Profile
|
|
2022-06-03 06:59:23
|
I'd need to know the compressed length of the ICC profile, which is not signaled by the codestream
|
|
|
yurume
|
2022-06-03 06:59:50
|
I think it is impossible, you at least have to decode all the symbols (the ICC codec then uses those symbols, which you don't have to implement if you just want to skip them)
|
|
|
Traneptora
|
2022-06-03 07:00:13
|
and to decode all those symbols would I need to implement basically all of section C?
|
|
|
yurume
|
2022-06-03 07:00:41
|
yeah. I think (but am not sure) that there is technically an index to all the frames, in the `jxli` box in the container, but if you don't have a container it's no use
|
|
|
Traneptora
|
2022-06-03 07:00:58
|
Yea, this is for a raw codestream
|
|
2022-06-03 07:01:17
|
and the jxli box is frequently not present
|
|
|
yurume
|
2022-06-03 07:02:11
|
it's annoying that the ICC stream is not skipable but placed before any frame and/or TOC
|
|
|
Traneptora
|
2022-06-03 07:02:29
|
Yea, that's what I'm trying to work around
|
|
|
yurume
|
2022-06-03 07:03:10
|
back in early this year I thought it would be neat to find out the exact amount of data you need to render a certain pass
|
|
2022-06-03 07:03:32
|
so that you can "fake" the progressive rendering
|
|
|
Traneptora
|
2022-06-03 07:04:29
|
so what actually do I need to implement to be able to decode (or rather skip) over the ICC profile? It looks like I have to implement section C.2 but section C.1 to me is kind of unclear, it just says "see these RFCs" which is a little bit eh?
|
|
|
yurume
|
2022-06-03 07:04:59
|
C.1 refers to certain sections in RFC 7932 (Brotli), which you also need to implement
|
|
2022-06-03 07:05:36
|
specifically you need the section 3 of RFC 7932 for tree coding
|
|
|
Traneptora
|
2022-06-03 07:05:56
|
blech, this is gonna take a bunch of time
|
|
2022-06-03 07:06:01
|
I'm not used to reading specs either
|
|
|
yurume
|
2022-06-03 07:06:08
|
let me look at jxsml to list exactly what are needed
|
|
2022-06-03 07:09:58
|
- cluster/context map decoding (C.2.5); this can recursively decode another entropy stream
- canonical Huffman tree decoding (C.1, referring to RFC 7932 Sec. 3)
- rANS distribution decoding (C.2.4)
- rANS alias table reconstruction (C.2.2)
- hybrid integer coding parameter decoding (C.2.7)
- hybrid integer coding aka `DecodeHybridVarLenUint` (C.2.3 + C.2.6)
|
|
2022-06-03 07:10:27
|
to my knowledge you can't even simplify hybrid integer coding because it can get used by C.2.5
|
|
2022-06-03 07:11:35
|
feel free to check out my jxsml (which is public domain btw) if you feel fine with that: https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f
|
|
2022-06-03 07:12:23
|
I believe it at least got entropy coding right
|
|
|
Traneptora
|
2022-06-03 07:20:41
|
I can't use your code cause the goal is to implement this in FFmpeg
|
|
2022-06-03 07:20:47
|
which requires me to copyright it and LGPL it
|
|
2022-06-03 07:20:58
|
while *you* may be fine with integrating public domain code, they won't
|
|
2022-06-03 07:21:10
|
since I don't believe I can copyright code I didn't write
|
|
2022-06-03 07:21:32
|
this will take a bunch of time, I see
|
|
|
yurume
|
2022-06-03 07:24:16
|
Well I will almost surely put it in CC0, which I believe is compatible with LGPL without requiring me to do anything else
|
|
|
_wb_
|
2022-06-03 07:40:00
|
You can also just be listed as one of the authors in ffmpeg, and since you wrote the code you can license it lgpl in ffmpeg and cc0 in your own project if you want
|
|
2022-06-03 07:41:04
|
With hindsight, the non-skippable icc profiles and entropy coded TOC permutation was a bit silly, we should have made that stuff parseable without implementing real entropy decode
|
|
2022-06-03 07:41:12
|
Too late now though
|
|
2022-06-03 07:42:26
|
It's one of the collateral damages caused by our urge to have minimalistic headers, so no redundant size fields or missed opportunity to entropy code some header data...
|
|
|
Traneptora
|
2022-06-03 07:44:38
|
is there a maximum number of `num_dist`?
|
|
2022-06-03 07:44:46
|
like 256 or something?
|
|
2022-06-03 07:44:53
|
or is there no cap
|
|
|
_wb_
|
2022-06-03 07:45:13
|
For Exif/XMP I did consider parsing-only applications (like exiftool/exiv2), which is why those can be found by just parsing isobmf (though we did add brob, which complicates stuff)... But for the codestream itself, I hadn't considered the need for no-decode parsing
|
|
2022-06-03 07:45:38
|
Uh, is that the one before or after clustering?
|
|
|
Traneptora
|
2022-06-03 07:45:54
|
`num_dist` is the number of clustered distributions (before clustering)
|
|
2022-06-03 07:46:03
|
I know the number of clusters is capped at 256
|
|
|
_wb_
|
2022-06-03 07:46:17
|
Before clustering isn't really capped
|
|
|
Traneptora
|
2022-06-03 07:46:36
|
is there a max that is actually used though? something low like 4096
|
|
|
_wb_
|
2022-06-03 07:46:42
|
Nothing low
|
|
|
Traneptora
|
|
_wb_
|
2022-06-03 07:47:01
|
AC uses quite a lot of pre-clustering contexts
|
|
|
Traneptora
|
2022-06-03 07:47:01
|
I'm trying to stack allocate an array of clusters without reading num_dist
|
|
2022-06-03 07:47:11
|
but looks like I gotta heap allocate it after reading num_dist
|
|
|
_wb_
|
2022-06-03 07:47:19
|
And so can modular when it has a large enough tree
|
|
|
yurume
|
2022-06-03 07:47:35
|
after clustering it *is* capped to 256, this is one thing the spec was unclear about
|
|
|
_wb_
|
2022-06-03 07:47:48
|
Yeah, after clustering it is capped
|
|
|
Traneptora
|
2022-06-03 07:47:48
|
the spec is clear to me about that
|
|
|
_wb_
|
2022-06-03 07:48:07
|
But the cluster mapping itself can be large
|
|
|
yurume
|
|
Traneptora
the spec is clear to me about that
|
|
2022-06-03 07:49:01
|
was it? I was confused about that for a while until I found that inverse MTF implicitly caps it to 256 anyway.
|
|
|
Traneptora
|
2022-06-03 07:49:19
|
it states it explicitly, yea
|
|
2022-06-03 07:49:20
|
> All integers in [0, num_clusters) are present in this array, and num_clusters < 256. Position i in this array indicates that context i is merged into the corresponding cluster.
|
|
|
_wb_
|
2022-06-03 07:49:32
|
What version is that?
|
|
|
yurume
|
2022-06-03 07:49:34
|
ah, you are looking at the fixed version ๐
|
|
|
_wb_
|
2022-06-03 07:49:43
|
The clarified version:)
|
|
|
Traneptora
|
2022-06-03 07:49:54
|
yea, an updated draft
|
|
|
yurume
|
|
_wb_
With hindsight, the non-skippable icc profiles and entropy coded TOC permutation was a bit silly, we should have made that stuff parseable without implementing real entropy decode
|
|
2022-06-03 07:50:36
|
maybe we are still able to partially fix it by extending the image header bundle
|
|
2022-06-03 07:51:39
|
say, if the extension bit 0 is set, the next xx bits are the ICC size and it is guaranteed that no TOC permutation is in use
|
|
|
_wb_
|
2022-06-03 07:51:41
|
Well not really unless we invalidate existing bitstreams that don't use that extension but have an icc profile or toc permutation
|
|
|
yurume
|
2022-06-03 07:53:03
|
I think this is mostly for the optimization (that wants to read groups as soon as possible) or optimistic partial decoders (it suffices for them to work for majority of bitstreams, can fall back otherwise)
|
|
|
_wb_
|
2022-06-03 07:53:07
|
One thing we could check though is what parts of entropy coding are actually used for icc compression and toc permutations
|
|
|
Traneptora
|
2022-06-03 07:53:34
|
I asked yurume earlier, apparently basically all of it
|
|
|
_wb_
|
2022-06-03 07:53:44
|
Theoretically, yes
|
|
|
yurume
|
2022-06-03 07:53:53
|
what bits are actually used in libjxl, I assume
|
|
|
Traneptora
|
2022-06-03 07:53:59
|
ah but all JXL files atm are produced by libjxl or fjxl atm
|
|
|
_wb_
|
2022-06-03 07:54:14
|
But maybe some of it doesn't end up getting used in existing bitstreams, and then we could add restrictions in the spec
|
|
2022-06-03 08:00:24
|
Then again if there is a short implementation of the entropy coding stuff given by jxsml, that can be used in parsing-only applications, maybe that's good enough
|
|
|
Traneptora
|
2022-06-03 08:00:33
|
for example, do MTF-clustered distributions end up used?
|
|
|
_wb_
|
2022-06-03 08:00:57
|
Skipping over icc/toc will of course be faster, but it's not like it's typically a large amount of stuff to entropy decode
|
|
2022-06-03 08:01:12
|
Only for CMYK profiles can it get large
|
|
2022-06-03 08:01:48
|
In the more typical case of RGB profiles, it's typically a few kb, and probably less than that to entropy decode
|
|
2022-06-03 08:03:10
|
Same for TOC permutations: typically not used, and if used, it's data proportional to the number of groups only, i.e. at most 16 symbols per megapixel
|
|
|
Traneptora
|
2022-06-03 08:03:50
|
I was under the impression that TOC permutations were going to be very common
|
|
|
_wb_
|
2022-06-03 08:03:58
|
Maybe
|
|
2022-06-03 08:04:08
|
They're not done by default atm
|
|
2022-06-03 08:04:29
|
It's useful though, so that could change
|
|
|
Traneptora
|
2022-06-03 08:04:34
|
interesting
|
|
|
_wb_
|
2022-06-03 08:05:13
|
But a parsing application might want to decode the TOC anyway, including permutation info
|
|
|
Traneptora
|
2022-06-03 08:05:22
|
well you'd have to
|
|
2022-06-03 08:05:28
|
since the actual blocks are after the TOC
|
|
|
_wb_
|
2022-06-03 08:05:41
|
I mean decode, not just parse/skip
|
|
|
Traneptora
|
2022-06-03 08:06:12
|
I care about the TOC because it lets me find the end of the codestream
|
|
2022-06-03 08:06:46
|
by adding the lengths together
|
|
2022-06-03 08:06:53
|
the actual permutation doesn't affect that
|
|
|
_wb_
|
2022-06-03 08:06:55
|
Because you may want to know where different pixel regions are located in the codestream, e.g. for region of interest decode
|
|
|
Traneptora
|
2022-06-03 08:07:00
|
possibly
|
|
2022-06-03 08:07:07
|
but for me that's not what I need, I need something more basic
|
|
|
yurume
|
2022-06-03 08:07:14
|
ah that makes sense
|
|
|
_wb_
|
2022-06-03 08:07:26
|
Yeay to just find the end you only need the toc without the permutation
|
|
|
yurume
|
2022-06-03 08:07:45
|
so you want something like `jxli` but that always works even when it is not present
|
|
|
Traneptora
|
2022-06-03 08:17:44
|
what is the context `ctx` in `DecodeHybridVarLenUint(ctx)`
|
|
|
yurume
|
2022-06-03 08:18:03
|
it is supplied by the caller
|
|
|
Traneptora
|
2022-06-03 08:18:11
|
is it an integer?
|
|
|
yurume
|
2022-06-03 08:18:17
|
yes, non-negative one (and also `< num_dist`)
|
|
|
Traneptora
|
2022-06-03 08:18:44
|
> For each pre-clustered distribution i, the decoder reads an integer as specified in C.2.6.
|
|
2022-06-03 08:19:19
|
Does this mean that for each pre-clustered distribution `i` it populates `clusters[i]` with `DecodeHybridVarLenUint(i)`?
|
|
|
yurume
|
2022-06-03 08:20:04
|
to my knowledge the context is always 0, because it has already read a single clustered distribution (i.e. `num_dist` is 1)
|
|
2022-06-03 08:20:10
|
but yeah that needs to be clarified
|
|
|
Traneptora
|
2022-06-03 08:20:56
|
when did it read that single clustered distribution?
|
|
|
yurume
|
2022-06-03 08:21:17
|
> The decoder then initializes a symbol decoder using a single distribution D, as specified in C.2.1.
|
|
2022-06-03 08:21:43
|
ah wait
|
|
|
Traneptora
|
2022-06-03 08:21:44
|
but I'm implementing section 2.5, which is required by section 2.1
|
|
2022-06-03 08:22:22
|
where do I get the various `configs[clusters[ctx]]` that I need?
|
|
2022-06-03 08:22:28
|
I don't recall it telling me to initiatlize a config
|
|
|
yurume
|
2022-06-03 08:23:24
|
I'm not sure if my implementation works correctly for those nested ones, but if I'm correct 2.5 recursively invokes 2.1
|
|
2022-06-03 08:23:49
|
but the nesting ends there because 2.5 is only used when 2+ pre-clustering distributions are in use
|
|
|
Traneptora
|
2022-06-03 08:24:22
|
This is in section 2.1:
> The decoder then reads the mapping clusters of the `num_dist` distributions into `num_clusters` clusters as specified in C.2.5. It then reads `use_prefix_code` as a Bool(). If `use_prefix_code` is false, the decoder sets `log_alphabet_size` to `5 + u(2)`; otherwise, it sets `log_alphabet_size` to 15. Then for each post-clustered distribution, in increasing order of index i in `[0, num_clusters - 1)`, the decoder reads `configs[i] = HybridUintConfig(log_alphabet_size)` as specified in C.2.7.
|
|
2022-06-03 08:24:29
|
So you're saying C2.5 invokes 2.1 again
|
|
|
yurume
|
2022-06-03 08:24:33
|
correct
|
|
|
Traneptora
|
2022-06-03 08:24:56
|
argh
|
|
2022-06-03 08:24:59
|
this is getting more complicated
|
|
|
yurume
|
2022-06-03 08:25:18
|
yeah, no doubt why the corresponding spec was pretty buggy
|
|
|
Traneptora
|
2022-06-03 08:25:36
|
> If num_dist == 1, then num_clusters = 1 and clusters[0] = 0, and the remainder of this subclause
> is skipped.
Eventually you have to hit this case, or it recurses forever.
|
|
2022-06-03 08:37:28
|
in `DecodeHybridVarLenUint` where is `window` defined?
|
|
2022-06-03 08:38:08
|
likewise where is `copy_pos` defined?
|
|
2022-06-03 08:38:15
|
it appears they aren't
|
|
|
yurume
|
2022-06-03 08:38:17
|
it's not defined, and one of the reasons I proposed a restructuring
|
|
|
Traneptora
|
2022-06-03 08:38:43
|
the first thing you do is a read from that array, not a write
|
|
|
yurume
|
2022-06-03 08:39:12
|
`window` is implicitly a `1<<20`-element array of output symbols, while `copy_pos` is undefined until LZ77 is in active (`num_to_copy > 0`)
|
|
2022-06-03 08:40:03
|
in the actual implementation I guess `window` should be flexible, since not every stream has more than 2^20 symbols
|
|
|
Traneptora
|
2022-06-03 08:40:26
|
is `window` the LZ77 sliding window of backreference buffer?
|
|
|
yurume
|
2022-06-03 08:40:45
|
yes, that's the reason that it can be initially undefined just fine
|
|
|
Traneptora
|
2022-06-03 08:40:55
|
and in this case is it a revolving window?
|
|
2022-06-03 08:41:23
|
it looks like it's circular so the array doesn't need to be shifted
|
|
2022-06-03 08:41:56
|
`0xFFFFF` is `(1 << 20) - 1` as well, right?
|
|
|
yurume
|
2022-06-03 08:42:22
|
sort of, you can implement it with a non-revolving window but implementing it as written in the spec would be simpler
|
|
2022-06-03 08:42:28
|
also yes for `0xFFFFF`
|
|
|
Traneptora
|
2022-06-03 08:42:42
|
I mean the spec looks like it's treating the window as a revolving window
|
|
2022-06-03 08:43:08
|
but I see how this works, okay
|
|
|
yurume
|
2022-06-03 08:43:08
|
(typo from mine)
|
|
|
Traneptora
|
2022-06-03 08:43:27
|
I think I understand how this works now
|
|
2022-06-03 08:44:00
|
initially, `num_to_copy` is zero and if `lz77.enabled && token >= lz77.min_symbol` then it has discovered a length/distance pair
|
|
2022-06-03 08:44:08
|
if it's not greater than that, it's a literal and just copies it as is
|
|
|
yurume
|
|
Traneptora
|
2022-06-03 08:44:50
|
so in the event where lz77.enabled is false, the window part could honestly just be skipped altogether
|
|
|
yurume
|
2022-06-03 08:44:51
|
the reason why it is structured in that way is that the decoding routine is not active but passively called from the caller
|
|
2022-06-03 08:45:12
|
so a single LZ77 sequence may persist across multiple calls to `DecodeHybridVarLenUint`
|
|
|
Traneptora
|
2022-06-03 08:45:33
|
`token = /* Read symbol from entropy coded stream using D[clusters[ctx]] */ ;`
in this case, D is the entire distribution read from section 2.1 initialized earlier, right?
|
|
|
yurume
|
2022-06-03 08:46:38
|
if `use_prefix_code` is false, `D` is indeed an array of frequencies and read according to C.2.3 (you have to reconstruct an alias table before); if `use_prefix_code` is true it doesn't exactly exist but you instead read a number using a previously decoded Huffman tree
|
|
|
Traneptora
|
2022-06-03 08:48:00
|
oh that is not clear at all
|
|
2022-06-03 08:48:40
|
so section 2.3 specifies how to read a symbol from a stream given an already-constructed distribution D
|
|
2022-06-03 08:49:01
|
so to construct that distribution D, you follow the instructions in section 2.1
|
|
2022-06-03 08:49:12
|
part of following those instructions in 2.1 requires you to read the clustering, by invoking 2.5
|
|
2022-06-03 08:49:29
|
and sometimes, invoking 2.5 requires you to read an entire distribution E
|
|
2022-06-03 08:50:22
|
so basically, to do section 2.5, you read an entire distribution called E as described in section 2.1
|
|
2022-06-03 08:50:44
|
then you read symbols from the codestream, using E as the distribution
|
|
2022-06-03 08:50:56
|
that lets you populate the clustering
|
|
2022-06-03 08:51:16
|
and then when 2.5 returns, and then 2.1 returns you end up with a distribution D that is the one you *actually* care about
|
|
2022-06-03 08:51:47
|
I see
|
|
|
yurume
|
2022-06-03 08:52:10
|
yeah, it is quite hard to understand at the first glance
|
|
|
Traneptora
|
2022-06-03 08:53:13
|
so here's what I have so far in pseudo-code
|
|
2022-06-03 09:11:27
|
```c
// assuming here for simplicity that use_prefix_code = false
/* section C.2.1 */
JxlSymbolD *read_distribution() {
read_lz77_params();
clusters = read_dist_clustering();
read_hybrid_uint_confs();
populate_d_array();
}
/* section C.2.2 */
AliasMapping *gen_alias_func(JxlSymbolD *d, int log_alphabet_size);
/* section C.2.3 */
int read_from_distribution(JxlSymbolD *d, AliasMapping *mapping);
/* section C.2.4 */
void populate_d_array();
/* section C.2.5 */
int *read_dist_clustering() {
if (simple) return simple_clustering();
JxlSymbolD *e = read_distribution();
for (int i = 0; i < num_dist; i++)
clusters[i] = read_hybrid_integer(e, 0);
return clusters;
}
/* section C.2.6 */
int read_hybrid_integer(JxlSymbolD *d, HybridUintConf *conf, int ctx) {
token = read_from_distribution(d);
do_lz77_stuff();
return do_hybrid_stuff(token, conf);
}
/* section C.2.7 */
HybridUintConf *read_hybrid_uint_conf(int log_alphabet_size);
```
|
|
2022-06-03 09:11:39
|
it looks like it works roughly like this
|
|
2022-06-03 09:11:41
|
is this correct?
|
|
|
yurume
|
2022-06-03 09:12:52
|
`read_hybrid_integer(e, i)` is missing `ctx == 0`, and you don't read `token` when lz77 sequence is ongoing
|
|
|
Traneptora
|
2022-06-03 09:13:43
|
changed that, so now it's `clusters[i] = read_hybrid_integer(e, 0)`
|
|
2022-06-03 09:13:59
|
also w/r/t the lz77 stuff I just put that all under do_lz77_stuff()
|
|
|
yurume
|
2022-06-03 09:15:19
|
okay
|
|
2022-06-03 09:15:41
|
`read_hybrid_uint_confs` and `populate_d_array` should be repeated `num_clusters` times as well
|
|
|
Traneptora
|
2022-06-03 09:15:58
|
it's populating an array so it's included
|
|
2022-06-03 09:16:02
|
this is psuedo-code
|
|
2022-06-03 09:16:11
|
just trying to get a *general* flowchart here
|
|
|
yurume
|
2022-06-03 09:16:20
|
ah good
|
|
|
Traneptora
|
2022-06-03 09:16:37
|
one thing I'm still confused about is where is `index` defined in section C.2.3
|
|
|
yurume
|
2022-06-03 09:16:38
|
I think `gen_alias_func` should be also called at the very end of `read_distribution`, which is not obvious from the spec
|
|
|
Traneptora
|
2022-06-03 09:16:58
|
oh yea it does appear that way, but you are correct it's not obvious
|
|
2022-06-03 09:17:03
|
should be explicit
|
|
2022-06-03 09:17:35
|
oh wait I lied it's defined
|
|
|
yurume
|
2022-06-03 09:17:43
|
for C.2.3 `index` is a temporary variable I think?
|
|
|
Traneptora
|
2022-06-03 09:18:48
|
hm, so at the end of section 2.1, it says
> The decoder reads integers from entropy coded streams as specified in C.2.6.
|
|
2022-06-03 09:19:07
|
what I'm wondering is, what `ctx` does it pass?
|
|
|
yurume
|
2022-06-03 09:19:12
|
that sentence is quite confusing indeed
|
|
2022-06-03 09:19:46
|
the decoder *will* read integers by calling `DecodeHybridVarLenUint` at the later moment in decoding
|
|
|
Traneptora
|
2022-06-03 09:20:21
|
it says, the decoder will read from this generated D by calling section C.2.6, but where is `ctx` specified
|
|
2022-06-03 09:20:34
|
is it something the decoder must do?
|
|
|
yurume
|
2022-06-03 09:20:59
|
uh, the last paragraph of C.2.1 should be considered a summary of what C.2.6 specifies
|
|
2022-06-03 09:21:40
|
more accurately speaking, that paragraph explains `/* Read symbol from entropy coded stream using D[clusters[ctx]] */` in C.2.6
|
|
2022-06-03 09:22:02
|
I don't like that arrangement and that paragraph should go straight into C.2.6
|
|
|
Traneptora
|
2022-06-03 09:22:26
|
So when, I see this:
> the decoder reads permutation (F.3.2) from a single entropy coded stream with 8 clustered distributions, as specified in C.2, with size equal to the number of TOC entries and skip = 0.
what I do is I call section C.2.1 with `num_dist = 8` to generate `D`, and then when I need a symbol I invoke section C.2.3
|
|
2022-06-03 09:22:52
|
assuming `use_prefix_code = false` for simplicity
|
|
|
yurume
|
2022-06-03 09:23:11
|
you should invoke C.2.6
|
|
|
Traneptora
|
2022-06-03 09:23:17
|
what `ctx` do I pass?
|
|
|
_wb_
|
2022-06-03 09:23:20
|
Please remember these unclear passages so they can get clarified in the spec.
|
|
|
yurume
|
2022-06-03 09:24:09
|
(I'll try to summarize this discussion into an issue, but I already have a backlog of issues I need to write down)
|
|
|
Traneptora
|
2022-06-03 09:24:37
|
what I'm trying to figure out is, when I need a symbol from the stream what ctx do I pass to `DecodeHybridVarLenUint`
|
|
|
yurume
|
|
Traneptora
So when, I see this:
> the decoder reads permutation (F.3.2) from a single entropy coded stream with 8 clustered distributions, as specified in C.2, with size equal to the number of TOC entries and skip = 0.
what I do is I call section C.2.1 with `num_dist = 8` to generate `D`, and then when I need a symbol I invoke section C.2.3
|
|
2022-06-03 09:25:10
|
ah wait, I realized that this is a part of TOC
|
|
|
Traneptora
|
|
yurume
|
2022-06-03 09:26:58
|
hmm, nevermind, I recalled a peculiar bit of TOC (namely, F.3.2 gets reused elsewhere in the spec) but that doesn't make the decoding process any different
|
|
2022-06-03 09:27:42
|
so yeah, you do C.2.1 with `num_dist = 8`, read `D`, then go to F.3.2 and whenever you see `DecodeHybridVarLenUint` calls you invoke C.2.6
|
|
2022-06-03 09:27:52
|
and `ctx` will be given as arguments to `DecodeHybridVarLenUint`
|
|
|
Traneptora
|
2022-06-03 09:28:36
|
Oh I see, `F.3.2` specifies what to pass to that
|
|
2022-06-03 09:32:43
|
> The decoder reads enc_size as U64(). The decoder reads 41 pre-clustered distributions as specified
> in C.2. The decoder then reads enc_size integers as specified in C.2.6 to obtain decompressed bytes
|
|
2022-06-03 09:33:04
|
does that mean in the ICC Profile stream, the alphabet size is `[0, 256)`
|
|
2022-06-03 09:34:17
|
also, I think I finally understand the general structure of this
|
|
2022-06-03 09:34:21
|
but that took some effort
|
|
|
yurume
|
2022-06-03 09:35:14
|
they are meant to be bytes, but it's unclear to me if the spec actually caps the maximum symbol
|
|
2022-06-03 09:35:54
|
hybrid integer decoding can technically output an enormous integer which is very likely to be invalid but not guaranteed
|
|
|
Traneptora
|
2022-06-03 09:37:46
|
I see
|
|
2022-06-03 09:38:01
|
not that it really matters here if you just assign it to a `uint8_t` and truncate it
|
|
|
yurume
|
2022-06-03 09:38:04
|
another item to report!
|
|
|
Traneptora
|
|
yurume
|
2022-06-03 09:38:23
|
I mean, I was filing spec issues for a while
|
|
2022-06-03 09:38:35
|
and my backlog just grew to... uh something like at least 4 issues? ...
|
|
|
Traneptora
|
2022-06-03 09:38:50
|
It looks like the TOC permutation is only 8 pre-clustered distributions, and the ICC Profile is 41. so if I'm only decoding those two I can cap the number of pre-clustered distributions to 42.
|
|
2022-06-03 09:39:12
|
because of the possible lz77 +1
|
|
|
yurume
|
2022-06-03 09:39:29
|
that was my gut feeling, but look at H.4.2 and I.3.1.2
|
|
2022-06-03 09:39:45
|
the number of pre-clustering distributions can be much higher
|
|
|
Traneptora
|
2022-06-03 09:39:50
|
I don't need to decode modular mode
|
|
2022-06-03 09:39:56
|
my goal is to determine the end of the file
|
|
|
yurume
|
2022-06-03 09:40:11
|
ah yeah that's enough for a specialized code path
|
|
|
Traneptora
|
2022-06-03 09:40:11
|
if I can read the TOC then I can add up the block lengths (ignoring the actual contents of the permutation)
|
|
2022-06-03 09:40:27
|
(since that won't change the length)
|
|
|
yurume
|
2022-06-03 09:40:46
|
I think 41 (+1) is the maximum for fixed-sized distributions
|
|
|
Traneptora
|
2022-06-04 12:17:49
|
> The ANS decoder keeps an internal 32-bit unsigned integer state. Upon creation of the decoder (immediately
> before reading the first symbol from a new ANS stream), state is initialized with a u(32) from the codestream.
Does this mean that the first 32 bits of any entropy coded stream are these u(32), or is it after the clustering etc. but before the first symbol
|
|
|
yurume
|
2022-06-04 12:23:14
|
before the first symbol.
|
|
2022-06-04 12:23:53
|
so if the distribution is read but the first symbol is preceded by some other data u(32) will be read after that
|
|
|
Traneptora
|
2022-06-04 12:43:36
|
C.2.3 doesn't actually say what it returns
|
|
2022-06-04 12:43:43
|
does it return the updated value of `state`?
|
|
2022-06-04 12:49:13
|
> After the decoder reads the last symbol in a given stream, state is 0x130000.
is this the *definition* of the last symbol?
|
|
|
yurume
|
|
Traneptora
C.2.3 doesn't actually say what it returns
|
|
2022-06-04 01:12:37
|
indeed, the return value is `symbol` and only `state` is going to persist here
|
|
|
Traneptora
> After the decoder reads the last symbol in a given stream, state is 0x130000.
is this the *definition* of the last symbol?
|
|
2022-06-04 01:13:28
|
this is a way to say that the state *should* be 0x130000 after the last symbol. the spec is written from a perspective of the decoder so such a sentence is actually an assertion.
|
|
|
Traneptora
|
|
yurume
this is a way to say that the state *should* be 0x130000 after the last symbol. the spec is written from a perspective of the decoder so such a sentence is actually an assertion.
|
|
2022-06-04 02:10:56
|
I mean, does the state equalling 0x130000 signify EOF, or is EOF signalled elsewhere
|
|
2022-06-04 02:11:10
|
> `if (logcounts[i] == 13) { rle = U8(); same[i] = rle + 5; i += rle + 3; continue; }`
what is `same[i]`?
|
|
|
yurume
|
|
Traneptora
I mean, does the state equalling 0x130000 signify EOF, or is EOF signalled elsewhere
|
|
2022-06-04 02:11:56
|
there is no separate EOF, but the stream only would read a finite amount of symbols and that last state acts as a checksum
|
|
|
Traneptora
|
2022-06-04 02:13:17
|
I see
|
|
|
yurume
|
2022-06-04 02:15:14
|
that `same[i]` is kinda annoying, so the frequency distribution is read in two passes, the first will read exponents + run lengths if that log is 13, then the second will read mantissas
|
|
2022-06-04 02:16:03
|
since run lengths are read in the first pass (I don't know why), the second pass needs to know where runs occur and thus skip reading mantissas
|
|
2022-06-04 02:16:40
|
that's what `same[i]` does; if non-zero, it means a run starts at `i` and ends at `i + same[i] + some const I can't remember`
|
|
|
Traneptora
|
2022-06-04 02:48:59
|
`symbols.push_back(s); offsets.push_back(bucket_size ร i); cutoffs.push_back(0);`
|
|
2022-06-04 02:49:03
|
what is `push_back`
|
|
|
yurume
|
2022-06-04 02:56:10
|
they are (implicitly) supposed to be a C++ vector
|
|
2022-06-04 02:56:57
|
pseudocodes in the spec rarely define variables and leave them to a common sense (at least for C++ coders)
|
|
|
Traneptora
|
2022-06-04 03:40:59
|
I'm not a C++ programmer, that's why I asked
|
|
2022-06-04 03:41:10
|
it looked like C++ stuff
|
|
2022-06-04 03:43:00
|
alright, looks like I got this thing done! well mostly done. still have to implement brotli trees
|
|
2022-06-04 03:43:01
|
https://github.com/thebombzen/FFmpeg/blob/jpegxl-ans/libavcodec/jpegxl_parser.c
|
|
|
yurume
|
2022-06-04 03:45:28
|
does ffmpeg have its own huffman decoder? (just out of curiosity)
|
|
|
Traneptora
|
2022-06-04 03:45:45
|
it links zlib, I dont' know if zlib can do it
|
|
2022-06-04 03:46:00
|
and it probably does tbh
|
|
|
yurume
does ffmpeg have its own huffman decoder? (just out of curiosity)
|
|
2022-06-04 03:48:56
|
I believe it has its own VLC decoder, now that I think about it
|
|
|
yurume
|
2022-06-04 03:49:29
|
if it has a common infrastructure for optimized huffman decoder (as opposed to having separate decoders for each codec) I guess it's a way to go
|
|
|
Traneptora
|
2022-06-04 03:49:36
|
I believe it does yea
|
|
2022-06-04 03:49:41
|
that way I don't have to implement it
|
|
2022-06-04 03:49:48
|
atm I just abort out and go "oops"
|
|
|
_wb_
|
2022-06-04 05:13:18
|
Well just getting the dimensions is not _that_ hard
|
|
|
|
veluca
|
2022-06-04 09:20:31
|
also definitely not trivial though
|
|
|
_wb_
|
2022-06-04 09:27:08
|
No, but still just bit reading, no ANS alias tables etc
|
|
|
Traneptora
|
2022-06-04 04:05:06
|
does JXL support extended size ISOBMFF boxes?
|
|
2022-06-04 04:05:28
|
that is the `size` field is set to `1` and then it is immediately followed by an 8-byte box-size
|
|
|
|
veluca
|
2022-06-04 04:30:58
|
it should in theory
|
|
2022-06-04 04:31:20
|
I think
|
|
|
_wb_
|
2022-06-04 04:37:06
|
Yes, at least the spec says it does
|
|
2022-06-04 04:37:22
|
I assume libjxl also allows it
|
|
|
|
JendaLinda
|
2022-06-05 04:49:03
|
As JXL supports images in CMYK color space, is there any program that can do that? For example, transcoding CMYK TIFF losslessly to CMYK JXL?
|
|
|
_wb_
|
2022-06-05 05:05:40
|
I wrote a libjxl application that can reversibly encode PSD (including cmyk ones) but I don't have clearance yet to release it
|
|
2022-06-05 05:10:21
|
I wish there was a simple PPM style format for cmyk, then it would be easier to add support for that to cjxl/djxl
|
|
|
|
JendaLinda
|
2022-06-05 05:23:03
|
TIFF seems to be the most widely supported, general purpose format capable of CMYK.
|
|
2022-06-05 05:26:20
|
I tried Imagemagick, but either it's broken or I don't know how to use it. It just takes the CMY channels and encodes them like they were RGB, so the result is completely wrong.
|
|
|
_wb_
|
2022-06-05 05:30:22
|
Yeah that's not going to work if they didn't implement a code path for cmyk encoding specifically
|
|
2022-06-05 05:32:07
|
The annoying thing about tiff is that it is even more complicated than psd, and it's just as proprietary (though it has been standardized for a while)
|
|
2022-06-05 05:33:00
|
Even just baseline tiff is already pretty annoying to implement for a format that's mostly uncompressed
|
|
|
Fraetor
|
2022-06-05 05:50:03
|
Are CMYK JPEGs non-standardised, or are you really looking for a lossless format?
|
|
|
|
JendaLinda
|
2022-06-05 05:56:42
|
cjxl doesn't support CMYK JPEG. These are not standard and they can't be losslessly transcoded to JXL.
|
|
|
_wb_
Even just baseline tiff is already pretty annoying to implement for a format that's mostly uncompressed
|
|
2022-06-05 05:59:03
|
Could libtiff be used for that?
|
|
|
_wb_
|
2022-06-05 06:05:46
|
Yes, libtiff could be used
|
|
|
|
JendaLinda
|
2022-06-05 08:54:51
|
Speaking of PPM format, it looks like cjxl and djxl assume sRGB in PPM.
|
|
|
_wb_
|
2022-06-05 09:31:42
|
Yes but you can change that assumption, the syntax is somewhat arcane though
|
|
|
|
JendaLinda
|
2022-06-05 09:53:02
|
Indeed, Netpbm formats are kinda confusing. The official Netpbm specification requires to use some BT.709 gamma transfer function. I don't know why they thought it's a good idea. Everybody seem to ignore this requirement anyway.
|
|
|
_wb_
|
2022-06-05 09:56:05
|
I suppose that was before sRGB became a thing...
|
|
2022-06-05 09:57:15
|
I wonder why they didn't change it though
|
|
2022-06-05 09:57:36
|
Everyone considers untagged images to be sRGB now
|
|
2022-06-05 09:58:31
|
And ppm has no way to tag a colorspace or attach an icc profile, so interpreting it as sRGB makes the most sense
|
|
2022-06-05 09:59:02
|
That's also what is done with untagged jpegs, pngs, and gifs.
|
|
2022-06-05 10:00:08
|
Who knows how untagged images were rendered before sRGB was a thing. I suppose some would assume 709, some would assume gamma 2.2, and some would assume gamma 1.8
|
|
2022-06-05 10:01:22
|
Which lead to the famous "images look too dark" thing when taking Mac images to PC, etc
|
|
|
|
JendaLinda
|
2022-06-05 10:02:08
|
I'd say, before sRGB, 24bpp on PC was kind of luxury.
|
|
|
_wb_
|
2022-06-05 10:02:47
|
Sure, that too
|
|
2022-06-05 10:03:19
|
And monitors would often have brightness and saturation knobs anyway, so who knows what is right and what is wrong then
|
|
|
|
JendaLinda
|
2022-06-05 10:05:56
|
For the reasons, I'm avoiding netpbm formats and I prefer BMP as a simple image format.
|
|
2022-06-05 10:07:10
|
It has its own quirks and limitations but works quite well.
|
|
|
Traneptora
|
|
_wb_
I assume libjxl also allows it
|
|
2022-06-06 04:48:34
|
it doesn't on the decode side, that's why I ask
|
|
2022-06-06 04:50:12
|
|
|
2022-06-06 04:50:16
|
here's a sample
|
|
|
_wb_
|
2022-06-06 05:20:25
|
Oh. That's a bug then. Is it just djxl or also libjxl?
|
|
|
Traneptora
|
|
_wb_
Oh. That's a bug then. Is it just djxl or also libjxl?
|
|
2022-06-06 06:04:49
|
I tested both and produced the same issue
|
|
2022-06-06 06:05:48
|
libjxl reported JXL_DEC_NEED_MORE_INPUT even after reading the full file
|
|
2022-06-06 06:06:07
|
djxl reported "decoding container format failed"
|
|
2022-06-06 06:07:09
|
as far as i'm aware, the box size by spec is supposed to include the box tag, headers, and payload all at once
|
|
2022-06-06 06:07:13
|
like the *full* box size
|
|
2022-06-06 06:07:40
|
and it's possible that libjxl isn't subtracting the extra 8 bytes in the box size, so it thinks there's supposed to be 8 more bytes than there actually are
|
|
2022-06-06 06:07:57
|
if I'm wrong about that then my sample is corrupt, but I believe that's what's going on here
|
|
|
_wb_
|
2022-06-06 06:20:03
|
https://github.com/libjxl/libjxl/blob/main/lib/jxl/decode.cc#L1768
|
|
2022-06-06 06:20:25
|
That's what the code does
|
|
|
Traneptora
|
|
_wb_
https://github.com/libjxl/libjxl/blob/main/lib/jxl/decode.cc#L1768
|
|
2022-06-06 06:26:08
|
it looks like it's reading the tag before the extended size
|
|
2022-06-06 06:26:48
|
on Line 1766 it populates the `type` field and then it reads the extended size on line 1768
|
|
2022-06-06 06:28:19
|
I believe this is happening in the wrong order
|
|
2022-06-06 06:28:55
|
compare to the code here
|
|
2022-06-06 06:28:55
|
https://github.com/ffmpeg/FFmpeg/blob/master/libavformat/mov.c#L7813
|
|
2022-06-06 06:29:03
|
FFmpeg reads the extended size before it reads the tag
|
|
|
BlueSwordM
|
2022-06-06 06:29:53
|
I see that durandal has an interesting opinion about JXL ๐
|
|
|
Traneptora
|
2022-06-06 06:30:15
|
yea, Paul B. Mahol has lots of opinions on lots of things
|
|
2022-06-06 06:30:22
|
he's a quite rude individual
|
|
|
BlueSwordM
|
2022-06-06 06:32:47
|
Anyway, I don't understand the issue about the parser too much. Can you explain it for me a bit tbombz?
|
|
|
Traneptora
|
2022-06-06 06:33:13
|
a parser has to solve the issue "where does this frame end and the next frame begin"
|
|
2022-06-06 06:33:45
|
which for concatenated raw JXL files, requires you to find the end of the codestream. There's no reliable signature since 0xff0a can easily occur in the middle of a codestream.
|
|
2022-06-06 06:34:30
|
The problem is that JXL's file format does not make this easy. For something like PNG, every chunk contains a chunk_size so you can just, skip over all the chunks and when you find IEND, then you know you've found the end.
|
|
2022-06-06 06:35:37
|
JXL *does* contain a table of contents of blocks in the Frame Header, and in theory you could just add up the block sizes to determine the end of the frame. The *problem* with this is that the table of contents can contain a permutation.
|
|
2022-06-06 06:35:46
|
And the permutation, if present, is encoded with ANS.
|
|
2022-06-06 06:35:59
|
Likewise, if there is an ICC Profile present in the stream, that is also encoded with ANS.
|
|
2022-06-06 06:37:00
|
You don't actually need to *know* the contents of the ICC profile or the permutation table (since the permutation of the blocks won't affect their compressed length), but you do need to skip over these ANS streams. And the only way to determine the length of an ANS stream is to decode it.
|
|
2022-06-06 06:37:26
|
So this means that parsing a JXL codestream requires you to be able to decode ANS, which is ~500 lines of code if done naively, and more if you want to do it efficiently.
|
|
2022-06-06 06:38:32
|
So the issue with a parser is the fact that you need to include hundreds of lines of code to determine file boundaries, and whether or not that's something that you'd actually want to include.
|
|
2022-06-06 06:38:49
|
<@321486891079696385> does that make sense?
|
|
|
yurume
|
2022-06-06 06:42:53
|
I'm finally starting to look at VarDCT, it would take a time to wrap my head around it
|
|
|
Traneptora
|
2022-06-06 06:43:01
|
~~there's also a bunch of bugs in the JPEG XL spec so this is a lot of trial by fire~~
|
|
|
BlueSwordM
|
|
Traneptora
<@321486891079696385> does that make sense?
|
|
2022-06-06 06:43:27
|
Yeah. I perfectly understand now :)
|
|
|
Traneptora
|
2022-06-06 06:43:31
|
bugs in the code of the JXL spec is a problem because it makes it harder to determine if *you* followed the spec incorrectly or if the spec is wrong
|
|
2022-06-06 06:43:55
|
*wrong* in this case means does not match libjxl behavior, since the spec is derived from libjxl code and not the other way around
|
|
|
yurume
|
2022-06-06 06:44:04
|
I learned not to believe the spec at a face value ๐
|
|
|
Traneptora
|
2022-06-06 06:45:09
|
in hindsight, not spending 4 bytes on compressed ICC length was a mistake
|
|
2022-06-06 06:45:15
|
or even three
|
|
|
yurume
|
2022-06-06 06:45:50
|
I would say it's not too late, just put it into extensions
|
|
|
Traneptora
|
2022-06-06 06:46:29
|
yea, you can always include an extension with ID=1 which is "this is the size of the ICC profile" or ID=2 which is "this is the size of all the frames combined"
|
|
2022-06-06 06:46:30
|
or something
|
|
2022-06-06 06:46:41
|
I'd like that to be the case
|
|
2022-06-06 06:46:46
|
that's what extensions are for, tbf.
|
|
|
yurume
|
2022-06-06 06:47:36
|
why do you need frame boundaries by the way, say, seeking?
|
|
|
Traneptora
|
2022-06-06 06:47:40
|
concatenated files
|
|
2022-06-06 06:47:50
|
the end of the file is the end of a frame
|
|
|
yurume
|
2022-06-06 06:48:30
|
uh, there may be multiple frames necessary to produce a single image (even when not animated)
|
|
|
Traneptora
|
2022-06-06 06:48:40
|
Yea, but if you can find the frame boundary you can skip to the next frame
|
|
2022-06-06 06:48:52
|
since they're just concatenated
|
|
|
yurume
|
2022-06-06 06:49:02
|
yeah, so I assumed you needed them for seeking
|
|
|
Traneptora
|
2022-06-06 06:49:07
|
frames are always marked as "is_last" or not
|
|
2022-06-06 06:49:24
|
so you just keep skipping frames until you find one marked as is_last, and then the end of that frame is the end of the codestream
|
|
|
yurume
|
2022-06-06 06:49:45
|
frames can refer to previous frames so the actual seeking process would be complicated but it can be done without reading all the frames, technically
|
|
|
Traneptora
|
2022-06-06 06:50:03
|
well you would just need to read the frame header and then the TOC
|
|