|
_wb_
|
2022-05-26 08:13:31
|
Unless you want to do patches or splines or noise first
|
|
2022-05-26 08:14:13
|
Probably some jxl art should already decode at this point :)
|
|
|
yurume
|
2022-05-26 08:14:54
|
oh that's a good idea
|
|
2022-05-26 08:21:25
|
by the way after implementing MA prediction I was quite worried about the decoding performance, as the tree can naturally wreck optimizations
|
|
2022-05-26 08:21:41
|
I then learned that libjxl has lots of special cases for MAANS decoder ๐ฑ
|
|
|
_wb_
|
2022-05-26 08:23:31
|
Yeah
|
|
2022-05-26 08:23:53
|
We do try to avoid branching
|
|
|
yurume
|
2022-05-26 08:24:55
|
is a large-sized MA tree frequent? I have seen one with >3000 nodes, can't recall which image had them though
|
|
|
_wb_
|
2022-05-26 08:27:46
|
Typically it shouldn't be too large after taking out the "static" properties (stream id and channel id)
|
|
2022-05-26 08:29:47
|
Though I dunno, maybe cjxl -e 9 -E 3 does sometimes make large trees
|
|
|
yurume
|
2022-05-26 08:30:02
|
thinking about it, that 3000-plus-node MA tree might be mistakenly decoded as well
|
|
|
|
veluca
|
2022-05-26 11:52:40
|
that seems likely
|
|
2022-05-26 11:53:03
|
unless the image was *really* large
|
|
|
Fox Wizard
|
2022-05-27 11:59:43
|
Why not to always trust experts: https://www.funda.nl/koop/eindhoven/huis-42721840-joris-minnestraat-17/#foto-35
|
|
2022-05-28 12:00:00
|
I can see ringing and over sharpening without any zoom <:kekw:910212968405430273>
|
|
2022-05-28 12:00:47
|
Think someone cranked up "texture" and "sharpening" in Lightroom and called it a day...
|
|
2022-05-28 12:01:41
|
Or cranked up "texture" and "bicubic sharper" for scaling which tends to look like shit for downscaling. Causes aliasing and cranking up texture too high can make things look unnatural
|
|
|
yurume
|
2022-05-28 07:09:33
|
today in jxsml: extra channel decoding is done, though upsampling is a whole different beast and I rather wouldn't touch it right now (I'm not even sure the relative order of 4 different upsampling parameters)
|
|
2022-05-28 07:10:36
|
and then I've tried to decode the classic _Tropical Island Sunset with JXL logo overlay_, which I thought a good way to test different bit depths (this one uses 10bpp for example)
|
|
2022-05-28 07:11:07
|
and somehow the MA tree decodes correctly but the subsequent Huffman tree doesn't decode, what
|
|
2022-05-28 07:12:20
|
not publicly *announced*, but you can see the current revision from https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f (which is a highly unsafe and untested C code)
|
|
2022-05-28 07:16:16
|
I think you need to remove `inline` from the entire code, which is why I have a (commented) line `#define inline`
|
|
2022-05-28 07:17:07
|
something I should actually fix when I make it into a library form (for now it's just... a mix of library code and state dumper)
|
|
2022-05-28 07:18:02
|
in the actual development I always attach gdb to jxsml, so that I can break right at the moment `JXSML__ERR` is used
|
|
2022-05-28 07:19:06
|
I think it can now handle non-palettized fjxl outputs
|
|
2022-05-28 07:19:40
|
it crucially lacks VarDCT support, which is likely why you've seen a failure
|
|
|
spider-mario
|
2022-05-28 08:42:58
|
2meirl4meirl
|
|
|
yurume
|
2022-05-28 09:14:45
|
fun with uiCA
|
|
2022-05-28 09:15:53
|
to be honest at this stage assembly-level optimization is useless, but I do care about throughput of autovectorized code
|
|
2022-05-28 09:16:47
|
specificially I was comparing `p1 += ((twice_pixel_t) p0 + (twice_pixel_t) p2) >> 1;` vs `p1 += ((p0 ^ p2) >> 1) + (p0 & p2);` (a part of RCT)
|
|
2022-05-28 09:28:05
|
if I interpreted numbers correctly, the latter achieves at least 3x throughout most recent enough microarchitectures
|
|
|
_wb_
|
2022-05-28 09:39:58
|
If you find improvements there, feel free to also open a PR for libjxl :)
|
|
|
yurume
|
2022-05-28 09:41:15
|
I don't think I can compete with Highway ๐
|
|
2022-05-28 09:42:04
|
I wonder if Highway emits a dedicated average operation (PAVGW etc. in x86-64), which no compiler seems to detect and generate
|
|
2022-05-28 09:42:46
|
or maybe compilers are smart enough to know that PAVGW is rounding *up* and rounded-down average is better done with other operations
|
|
2022-05-28 09:45:19
|
wait, actually... libjxl does all the computation in int16_t (that is, it is equivalent to `p1 += (p0 + p2) >> 1;` and `p0 + p2` can overflow)
|
|
2022-05-28 09:45:50
|
if that's allowed my job is much simpler, but that doesn't seem to match the spec then :p
|
|
2022-05-28 09:47:01
|
(of course will file an issue about this, but it's time to have a dinner)
|
|
2022-05-28 10:07:16
|
Yes, provided that I compiled it with a correct target (I've used `-march=haswell/rocketlake` for quick testing)
|
|
|
_wb_
|
2022-05-28 10:17:21
|
Wait what is it doing in int16? I don't think it's doing anything modular in int16 atm
|
|
|
yurume
|
2022-05-28 10:18:06
|
It was definitely specialized to the pixel type, but I haven't looked at the caller
|
|
|
_wb_
|
2022-05-28 10:18:52
|
Pixel_type is always int32_t in libjxl atm
|
|
|
yurume
|
|
_wb_
|
2022-05-28 10:20:06
|
We should in theory be able to template it and use int16_t in case modular_16bit is true, but probably that will break stuff atm
|
|
|
|
veluca
|
2022-05-28 10:21:12
|
"probably"
|
|
2022-05-28 10:21:22
|
seems optimistic
|
|
|
yurume
specificially I was comparing `p1 += ((twice_pixel_t) p0 + (twice_pixel_t) p2) >> 1;` vs `p1 += ((p0 ^ p2) >> 1) + (p0 & p2);` (a part of RCT)
|
|
2022-05-28 10:22:22
|
wait why are those equivalent?
|
|
|
yurume
|
2022-05-28 10:23:28
|
Can be shown by dissecting the addition into full adders
|
|
2022-05-28 10:24:17
|
So for a single bit a and b a+b is (a xor b) + (a and b) * 2
|
|
2022-05-28 10:25:27
|
Since each bit is independent to each other and this alternative expression is linear...
|
|
|
|
veluca
|
2022-05-28 10:38:35
|
ah somehow I read that last part as p0 & p2 & 1
|
|
2022-05-28 10:38:49
|
then yes xD
|
|
|
yurume
|
2022-05-28 10:39:38
|
Off-topic: if you haven't read https://devblogs.microsoft.com/oldnewthing/20220207-00/?p=106223 it will be a fun read
|
|
|
|
veluca
|
2022-05-28 10:41:10
|
heh nice
|
|
2022-05-28 10:41:28
|
not sure what the spec says that should do tbh wrt overflows
|
|
2022-05-28 10:41:41
|
and in particular whether we specify anything for intermediate quantities
|
|
|
yurume
|
2022-05-28 10:44:16
|
It has a mention that any stored pixel during the transformation fits in 16 or 32 bits while intermediate results can be 64 bits long
|
|
2022-05-28 10:45:05
|
And I interpreted that (p0 + p2) is an intermediate result, not stored
|
|
|
_wb_
|
2022-05-28 10:54:22
|
"Stored" is anything that goes into a buffer in a (naive/generic) implementation that first puts decoded values in a buffer, then undoes each modular transform one by one and stores the results in buffers, until the final output which is also stored in a buffer. Stored pixels fit in int32 in general and in int16 in case of modular_16bit
|
|
2022-05-28 10:55:34
|
The intermediate values you get while computing a predictor or evaluating RCT/palette expressions could require more bits (but not more than int64, I suppose)
|
|
|
yurume
|
|
_wb_
"Stored" is anything that goes into a buffer in a (naive/generic) implementation that first puts decoded values in a buffer, then undoes each modular transform one by one and stores the results in buffers, until the final output which is also stored in a buffer. Stored pixels fit in int32 in general and in int16 in case of modular_16bit
|
|
2022-05-28 11:23:01
|
I think there is an ambiguity in whether results from each squeeze step are stored in (temporary) buffers or not
|
|
2022-05-28 11:23:27
|
anyway, otherwise that was my understanding
|
|
|
_wb_
|
2022-05-28 11:24:03
|
I would say they are
|
|
|
yurume
|
2022-05-28 11:24:54
|
and technically speaking the current implementation of RCT will overflow in very rare cases where modular_16bit_buffers is false and somehow buffers have very large numbers (>2^30) in magnitude
|
|
|
_wb_
|
2022-05-28 11:25:12
|
Not sure if it makes a difference. Output bitdepth of squeeze should be 1 bit lower than input bitdepth
|
|
|
yurume
|
2022-05-28 11:27:11
|
(I made it clear that the previous message was about RCT, not Squeeze)
|
|
|
|
veluca
|
2022-05-28 11:29:18
|
ugh, I guess we should fix that...
|
|
2022-05-28 11:29:43
|
we should have said that nothing above 24 bits is guaranteed to work xD
|
|
|
yurume
|
|
yurume
and technically speaking the current implementation of RCT will overflow in very rare cases where modular_16bit_buffers is false and somehow buffers have very large numbers (>2^30) in magnitude
|
|
2022-05-28 11:30:37
|
I was even thinking about having 4 different implementations, where the first one will be 16bit + no overflow, the second one is 16bit + overflow and so on (my code already uses different code paths for 16bit and 32bit buffers so it's doable)
|
|
|
_wb_
|
2022-05-28 11:43:47
|
The only code path where libjxl currently uses modular with big numbers is when doing lossless float, where I think it avoids RCTs and squeeze just to be sure.
|
|
|
yurume
|
2022-06-01 12:22:30
|
today in jxsml: weighted predictor is hard
|
|
|
_wb_
|
2022-06-01 05:22:16
|
Yeah it's pretty hairy. What is that image?
|
|
|
yurume
|
2022-06-01 10:22:03
|
as noted before, _Tropical Island Sunset with JXL logo overlay_ (the first frame)
|
|
2022-06-01 10:22:48
|
I've tracked it down to the possibly wrong alias table in the ANS decoder, which resulted in a wrong MA tree
|
|
|
_wb_
|
2022-06-01 10:39:53
|
Dumping the MA tree could help
|
|
|
yurume
|
2022-06-01 10:48:19
|
I've dumped all changes to the ANS state during the MA tree encoding and pinpointed the exact moment where it went wrong, but I'm yet to know why
|
|
2022-06-01 10:49:58
|
the finest printf debugging ๐
|
|
2022-06-01 12:01:49
|
okay, this was a culprit:
```
correct:
[alias1:(0: 0 5 0)(1: 0 5 64)(2: 0 5 128)(3: 0 5 192)(4: 0 5 256)(5: 0 5 320)(6: 0 5 384)...
wrong:
[alias1:(0: 0 5 64)(1: 0 5 128)(2: 0 5 192)(3: 0 5 256)(4: 0 5 320)(5: 0 5 0)(6: 0 5 384)...
```
|
|
2022-06-01 12:02:01
|
I thought I no longer have to debug the alias table but I was wrong :S
|
|
2022-06-01 12:11:15
|
aaaand it was entirely my mistake. the distribution has a single symbol at 5 and my faulty code was supposed to special-case it but it didn't thanks to a missing `+ 1`.
|
|
2022-06-01 12:12:09
|
(it seemed to work because in many cases that single symbol happened to be at 0, in which case the special case and normal case coincide)
|
|
2022-06-05 03:55:46
|
today in jxsml: weighted predictor is *way* hard
|
|
2022-06-05 03:56:38
|
there are at least 6 typos in the spec that can corrupt the final output, and it seems there are more
|
|
|
monad
|
2022-06-05 05:44:37
|
Those are some powerful typos.
|
|
|
|
veluca
|
|
yurume
there are at least 6 typos in the spec that can corrupt the final output, and it seems there are more
|
|
2022-06-05 09:43:06
|
not surprised
|
|
2022-06-05 09:43:09
|
fancy though
|
|
|
yurume
|
2022-06-05 01:39:51
|
actually this is much closer to the original (before cropping) than ever before
|
|
2022-06-05 01:40:10
|
the left half is more or less correct except for corrupted color; the right half is incorrect
|
|
|
_wb_
|
2022-06-05 04:13:30
|
Aren't both groups ok here?
|
|
|
yurume
|
2022-06-05 07:57:24
|
I should have made clear that the second image I posted is the reference image, not one jxsml produced
|
|
2022-06-05 07:58:04
|
it seems to do with stream indices where jxsml assigns 4 and 5 but apparently the image expects ~~20 and 21~~ 21 and 22
|
|
|
_wb_
|
2022-06-05 08:03:33
|
Ah
|
|
2022-06-05 08:03:41
|
Yes that numbering is funky
|
|
2022-06-05 08:04:10
|
There are numbers for all raw quant tables regardless of whether they are used or not
|
|
|
yurume
|
2022-06-05 08:04:24
|
ugh, another spec bug then
|
|
|
_wb_
|
2022-06-05 08:04:32
|
Spec probably needs to make that clearer
|
|
2022-06-05 08:07:11
|
Spec says "number of tables" but it should probably just say whatever that number is, since it is a fixed number
|
|
2022-06-05 08:09:02
|
It's the DctSelect index, which has range 0 to 16
|
|
2022-06-05 08:10:24
|
So I think the number should be 17, but then you still have an off by one?
|
|
|
yurume
|
2022-06-05 08:10:56
|
are those stream indices actually used or just reserved? I don't have a full understanding of vardct but I couldn't find anything that actually refer to those indices from the spec.
|
|
2022-06-05 08:12:23
|
if there were actually used, I suppose there should be a reference to `num_lf_groups` somewhere else since those indices would start from `1 + 3 * num_lf_groups`
|
|
2022-06-05 08:15:47
|
okay, *I* was off by one, the image actually refers to 21 and 22, so the number is fine
|
|
2022-06-05 08:17:03
|
I see no reference to `ModularStreamId::QuantTable` from libjxl, so I guess they are just reserved now
|
|
|
_wb_
|
2022-06-05 08:23:54
|
They are only used when raw quant tables are signalled
|
|
|
yurume
|
2022-06-05 08:24:34
|
ah indeed
|
|
2022-06-05 08:24:53
|
I casually looked for references to the section H but missed one buried in a pseudocode
|
|
|
_wb_
|
2022-06-05 08:27:58
|
So spec might be correct here, but certainly could be clearer
|
|
|
yurume
|
2022-06-05 08:28:21
|
how on earth does `ModularStreamId::QuantTable` actually exist in libjxl but Github search failed to find it
|
|
2022-06-05 08:31:54
|
anyway, updated jxsml output:
|
|
2022-06-05 08:32:19
|
it looks like just a faulty YCgCo conversion now?
|
|
2022-06-05 08:38:07
|
okay, YCgCo is okay but the sample had to be clamped (as per kRelative intent)! this is still slightly different from the reference image (bottom right) but gets almost everything right.
|
|
2022-06-05 08:39:21
|
this is a difference map; judging from this it seems that the border condition is still slightly off
|
|
|
_wb_
|
2022-06-05 09:29:12
|
I haven't looked yet at the recent spec issues you opened but I think it's likely that the weighted predictor has a couple of spec bugs/typos in it. It's a rather funky thing that started with Alexander Ratushnyak's code which was an almost unreadable bag of C macros that happened to do something. And then Luca and me made it even more complicated in an effort to make it more generic (params instead of fixed constants; the original version had fixed constants for uint8 and for uint16 but couldn't do any other bitdepth) and to fit it into modular in a way that made sense.
|
|
2022-06-05 09:30:36
|
It's a very powerful predictor for photographic images, but it's also a couple of magnitudes more complicated than any other predictor
|
|
|
yurume
|
2022-06-05 09:32:14
|
I'm most surprised by the fact that weights themselves are not a part of the state
|
|
2022-06-05 09:33:19
|
conceptually it combines 4 subpredictors by weights that are learnt from the past, but weights are calculated from past errors and not retained
|
|
|
_wb_
|
2022-06-05 09:34:15
|
Yes, I think that helps to make it follow local error patterns better
|
|
|
yurume
|
2022-06-05 09:34:23
|
yeah, that was my guess
|
|
|
_wb_
|
2022-06-05 09:34:40
|
But it's still a bit of a black magic thing to me
|
|
|
yurume
|
2022-06-05 09:34:56
|
but it can be also argued that MA trees do the same thing; it can _cover_ the regions where weighted predictors are yet to catch up
|
|
|
_wb_
|
2022-06-05 09:34:58
|
How the error feeds back into the predictors themselves and all
|
|
|
yurume
|
2022-06-05 09:35:30
|
I suppose `max_error` can be used for that purpose as well
|
|
|
_wb_
|
2022-06-05 09:35:40
|
It was designed to be used with a fixed ctx model, something like how we use it at -e 3
|
|
2022-06-05 09:36:02
|
Which is basically just a context based on max error
|
|
2022-06-05 09:39:32
|
So, maybe one more bug to hunt in the weighted predictor, and then there's only palette and squeeze left to have a full modular decoder?
|
|
|
yurume
|
2022-06-05 09:39:55
|
I've implemented palette already, I needed that for the full fjxl decoding
|
|
|
_wb_
|
2022-06-05 09:40:01
|
Ah right
|
|
2022-06-05 09:40:07
|
Also deltas and all?
|
|
|
yurume
|
2022-06-05 09:40:29
|
ah that part is still a bit elusive, but should work after some refactoring
|
|
2022-06-05 09:40:49
|
at the moment my code is structured in the way reusing predictors is hard
|
|
2022-06-05 09:41:41
|
I used to think predictors are integral part of modular image decoding, but due to palette I would have to refactor it into a separate part and data structure
|
|
|
_wb_
|
2022-06-05 09:42:19
|
I'm thinking maybe we just shouldn't have allowed deltas with the weighted predictor, it makes things harder than needed I think
|
|
2022-06-05 09:42:33
|
But oh well
|
|
|
yurume
|
2022-06-05 09:44:01
|
well palettes were not hard to implement after all
|
|
2022-06-05 09:44:17
|
it should be trivial to implement deltas after refactoring
|
|
2022-06-05 09:44:48
|
I think I've found only one or two bugs in the palette pseudocode, both can be fixed with a common sense
|
|
2022-06-05 09:45:04
|
bugs in weighted predictors are essentially impossible to fix in such ways though ๐
|
|
2022-06-05 10:28:51
|
the (hopefully) final wp bug is fixed, yay
|
|
2022-06-05 10:31:01
|
...and my hope was unjustified
|
|
2022-06-05 11:14:54
|
okay that one was not wp but another predictor, I think I've fixed them all
|
|
2022-06-06 01:39:34
|
updated the wp issue to reflect what I've found so far
|
|
2022-06-06 09:47:19
|
the first benchmarking attempt on jxsml: a particular 24.8MP modular image is decoded (and discarded) in 5.1s, compared to 4.6s for a particular build of djxl with a single thread
|
|
|
|
veluca
|
2022-06-06 09:54:38
|
what kind of modular image?
|
|
|
yurume
|
2022-06-06 09:55:37
|
encoded with `cjxl -d 0 -e 3 -j -m --patches=0`
|
|
2022-06-06 09:56:14
|
checking if 32-bit modular buffers make a difference (so far I was using 16-bit)
|
|
|
|
veluca
|
2022-06-06 09:56:22
|
mhhh yeah, decoding those is pretty hard to optimize
|
|
2022-06-06 09:56:48
|
i.e. I'm not surprised djxl isn't much faster
|
|
|
yurume
|
2022-06-06 09:56:58
|
not just that, but also encoding is very slow in that case (I couldn't use the default `-e 6` as it was too slow to encode)
|
|
|
|
veluca
|
2022-06-06 09:57:24
|
ah wait, that's -e 3, I read -e 9 somehow
|
|
2022-06-06 09:57:43
|
so it uses wp
|
|
|
yurume
|
2022-06-06 09:58:01
|
does `-e 4` use wp?
|
|
|
|
veluca
|
2022-06-06 09:58:08
|
I kinda feel like the gap should be higher then (I assume you didn't write special-case decode paths)
|
|
2022-06-06 09:58:19
|
IIRC it doesn't
|
|
|
yurume
|
2022-06-06 09:58:20
|
or, in the other words, which level is the lowest level that doesn't use wp
|
|
|
|
veluca
|
2022-06-06 09:58:29
|
-e 2 sure won't
|
|
|
yurume
|
2022-06-06 09:58:32
|
okay I'll try again with `-e 4` then
|
|
2022-06-06 10:01:18
|
so that does make a big difference, jxsml: 11.2s, single-threaded djxl: 8.2s
|
|
2022-06-06 10:02:04
|
time to implement a poor man's template in C
|
|
|
|
veluca
|
2022-06-06 10:02:11
|
heh
|
|
2022-06-06 10:02:30
|
you could also try `-e 2`
|
|
2022-06-06 10:02:38
|
difference should be more significant still
|
|
|
yurume
time to implement a poor man's template in C
|
|
2022-06-06 10:03:16
|
I assume for switching 16- and 32- bit buffers?
|
|
|
yurume
|
2022-06-06 10:03:53
|
```c
#ifndef JXSML__RECURSING
// non-templated code here
// template definition
#endif
#ifdef JXSML__RECURSING+0 == 123
// templated code here, using JXSML__P
#undef JXSML__P
#endif
#ifndef JXSML__RECURSING
// template invocation
#define JXSML__RECURSING 123
#define JXSML__P 16
#include __FILE__
#define JXSML__P 32
#include __FILE__
#undef JXSML__RECURSING
// non-templated code here
#endif
```
|
|
2022-06-06 10:04:03
|
that kind of thing
|
|
2022-06-06 10:04:52
|
I already have a condition for `modular_16bit_buffers`, just that my current code only handles the 16-bit case
|
|
|
|
veluca
|
2022-06-06 10:04:53
|
yup, nice preprocessor magic
|
|
|
_wb_
|
2022-06-06 10:49:36
|
e2 uses only Gradient, e3 only WP
|
|
2022-06-06 10:52:19
|
And they both use a tree that only uses one property so it can be done as a lookup table
|
|
|
yurume
|
2022-06-06 10:52:37
|
yeah, I'm not yet to filter the tree for that kind of analysis
|
|
2022-06-06 11:55:08
|
okay, I made jxsml to compute `use_wp` and only enable wp when necessary, and that brought jxsml on par with single-threaded djxl (both 1.7s) for `-e 1` images
|
|
2022-06-06 11:55:56
|
I wondered if I should make it also templated, but in hindsight we are already far from vectorization so a boolean flag check doesn't cost jxsml much
|
|
2022-06-06 11:58:59
|
for `-e 2` jxsml is still faster than before but it took 3.2s while djxl took 1.9s, so that's probably where a LUT makes a difference?
|
|
2022-06-07 12:00:13
|
I think this is enough for now, it is not extremely slower than djxl and that's acceptable for me
|
|
2022-06-07 12:01:35
|
now it's time to actually do the VarDCT stuff; I was specifically reading classic papers like Loeffler's _Practical fast 1-D DCT algorithms with 11 multiplications_
|
|
2022-06-07 12:04:13
|
I have no idea about the fastest known strategy for larger IDCT, is radix-2 or split-radix-2/4 a way to go? libjxl seems to have an automatically generated SIMD code for all supported DCT sizes and I don't know what generated it.
|
|
|
_wb_
|
2022-06-07 05:19:50
|
If you go back to earlier versions of libjxl, it had a recursive dct implementation before, which was also fast but fewer lines of code
|
|
|
|
veluca
|
|
yurume
I have no idea about the fastest known strategy for larger IDCT, is radix-2 or split-radix-2/4 a way to go? libjxl seems to have an automatically generated SIMD code for all supported DCT sizes and I don't know what generated it.
|
|
2022-06-07 08:56:20
|
uhm, I am pretty sure it doesn't have automatically generated code ๐
|
|
|
yurume
|
2022-06-07 08:56:42
|
is it crafted by hand????
|
|
|
|
veluca
|
2022-06-07 08:57:09
|
https://github.com/libjxl/libjxl/blob/main/lib/jxl/dct-inl.h#L40
|
|
2022-06-07 08:57:20
|
if you mean that file, I wrote it
|
|
|
yurume
|
2022-06-07 08:57:58
|
ah, I meant `fast_dct##-inl.h`
|
|
|
|
veluca
|
2022-06-07 08:58:05
|
ah *that*
|
|
2022-06-07 08:58:23
|
you can ignore it, it's just an arm-specific implementation
|
|
2022-06-07 08:58:36
|
not even used anywhere (yet?)
|
|
2022-06-07 08:59:35
|
it's the same algorithm as the code I linked above, modulo some tweaks to make it have slightly better precision in fixpoint... but probably not nearly as good as it could be
|
|
|
yurume
|
2022-06-07 07:19:05
|
today in jxsml: took time to update my msys2 install (which was stale since 2014 apparently), required for a QoL improvement in gdb like this:
|
|
2022-06-07 07:19:54
|
(my old version of gdb was using Python 2.7 for scripting...)
|
|
2022-06-08 12:15:26
|
nope, just that I never updated my msys2 install since 2014 ๐
|
|
2022-06-08 12:16:09
|
since then pacman moved to zstd, which caused a massive pain (I eventually had to remove the previous install and reinstall it)
|
|
2022-06-08 12:16:52
|
that kind of chore was probably why I never updated it since then
|
|
2022-06-08 08:20:36
|
today in jxsml: uh, oops.
```cpp
for (size_t c = 0; c < 3; c++) {
for (size_t i = 0; i < 9; i++) {
JXL_RETURN_IF_ERROR(F16Coder::Read(br, &encoding->afv_weights[c][i]));
}
for (size_t i = 0; i < 6; i++) {
encoding->afv_weights[c][i] *= 64;
}
JXL_RETURN_IF_ERROR(DecodeDctParams(br, &encoding->dct_params));
JXL_RETURN_IF_ERROR(DecodeDctParams(br, &encoding->dct_params_afv_4x4));
// ^ these get decoded 3 times, only retaining the last one, so previous two are duplicates & useless
}
```
|
|
2022-06-08 08:22:44
|
(it's both a spec bug and a libjxl encoder inefficiency, as it currently encodes same parameters 3 times)
|
|
|
_wb_
|
2022-06-08 08:30:06
|
Oops!
|
|
2022-06-08 08:30:32
|
How many existing bitstreams does that break if we fix it?
|
|
2022-06-08 08:30:42
|
<@179701849576833024>
|
|
|
|
veluca
|
2022-06-08 08:33:01
|
Probably 0?
|
|
2022-06-08 08:33:29
|
Ah, maybe those with dc_level greater than 1
|
|
|
yurume
|
2022-06-08 08:33:58
|
I think you can safely make it reserved, it's not a big deal
|
|
|
_wb_
|
2022-06-08 08:34:53
|
If libjxl encode doesn't signal that stuff, we can just fix it
|
|
|
|
veluca
|
2022-06-08 08:35:58
|
I'm surprised the encoder and the decoder have the same bug tbh
|
|
2022-06-08 08:36:47
|
Anyway, the only way I know of to get custom quantization matrices is to use uniform error mode
|
|
|
yurume
|
2022-06-08 08:36:49
|
surprisingly yes! so if you are super careful you can just fix the encoder to encode placeholder params for the first two.
|
|
|
|
veluca
|
2022-06-08 08:37:46
|
doesn't change much ๐
|
|
2022-06-08 08:39:44
|
ah
|
|
|
yurume
|
2022-06-08 08:39:49
|
saving at most 65 bytes? :p
|
|
|
|
veluca
|
2022-06-08 08:40:09
|
ah right, dct_params have VLE
|
|
|
yurume
|
2022-06-08 08:40:13
|
(= 2 * (4 + 16 * 16) bits)
|
|
|
|
veluca
|
2022-06-08 08:40:31
|
anyway, I checked and even progressive_dc=2 doesn't use custom AFV tables
|
|
|
yurume
|
2022-06-08 08:41:06
|
okay, I'll file an issue
|
|
|
|
veluca
|
2022-06-08 08:41:34
|
in fact, nothing in the entire libjxl source code that I could find uses custom AFV tables
|
|
2022-06-08 08:42:06
|
so I'd say the chance of actually breaking some bitstreams by fixing them is... pretty much 0
|
|
|
yurume
|
2022-06-08 08:52:06
|
https://github.com/libjxl/libjxl/issues/1484
|
|
|
|
veluca
|
2022-06-08 09:05:03
|
Thanks :)
|
|
|
yurume
|
2022-06-09 10:27:02
|
https://jsfiddle.net/0Lmqtd2n/ I visualized natural DCT coefficient order for a visual aid
|
|
|
|
veluca
|
2022-06-09 12:31:45
|
nice! yup, seems about right
|
|
|
_wb_
|
2022-06-09 04:14:47
|
cool
|
|
2022-06-09 04:16:08
|
why do we call that natural order btw? I would call it zigzag
|
|
|
yurume
|
2022-06-09 07:29:17
|
yeah, the code surely looked like an extension to JPEG zigzag order but I wanted to confirm that
|
|
|
|
veluca
|
|
_wb_
why do we call that natural order btw? I would call it zigzag
|
|
2022-06-09 07:35:57
|
no clue
|
|
|
_wb_
|
2022-06-09 08:02:26
|
It's slightly confusing since you could assume that the natural order is scanline order
|
|
2022-06-10 08:23:57
|
https://twitter.com/jonsneyers/status/1535356866301202434?s=20&t=FKuvYMVfyunxZwL4j81_CQ
|
|
2022-06-10 08:24:07
|
sigh I hate it that Twitter has no edit button
|
|
2022-06-10 08:24:29
|
"it is however a critical" wtf
|
|
|
yurume
|
2022-06-11 12:25:08
|
lower deviation (towards top) is better
|
|
2022-06-11 12:26:26
|
as I understand it, each point indicates that for a particular set of encoder & parameters, all images have this mean score and this standard deviation
|
|
2022-06-11 12:27:20
|
thinking about that lower deviation can (but not necessarily) result in a flatter curve, but anyway
|
|
|
monad
|
2022-06-11 01:35:07
|
Assuming high mean DMOS correlates with low standard deviation, "flatter is better" seems an okay interpretation.
|
|
|
_wb_
|
2022-06-11 06:05:49
|
A setting with a high mean DMOS is likely to have a lower stdev since DMOS is capped at 100 (can't go better than "as good as the original")
|
|
|
yurume
|
2022-06-11 06:08:01
|
today in jxsml: VarDCT pipeline is complex enough that I needed some outline (text version: https://gist.github.com/lifthrasiir/a6058584fde522de092a74b5e5517f73)
|
|
2022-06-11 06:17:36
|
now I have a lot of questions left unanswered ๐
|
|
|
_wb_
|
2022-06-11 06:22:39
|
E.g. mozjpeg q70 is the black point that gets a mean DMOS of 80 and has a stdev of 7, which means (assuming normal distribution, which is not really true but whatever) that 68% of the time, the DMOS is in [73,87] and 95% of the time, it's in [66,94]. At the same mean DMOS of 80, which for jxl is cjxl q70, that 95% interval would be [70,90] or so, while for avif and webp it would be [64,98] or so.
(The actual intervals would not be that symmetric, and be more skewed towards the lower side)
|
|
|
yurume
|
|
yurume
today in jxsml: VarDCT pipeline is complex enough that I needed some outline (text version: https://gist.github.com/lifthrasiir/a6058584fde522de092a74b5e5517f73)
|
|
2022-06-11 06:44:34
|
following this outline, my biggest question right now is how a varblock across group boundaries is handled
|
|
2022-06-11 06:46:28
|
I get that varblocks are in the raster order of its left-top corner, and they should not overlap each other
|
|
2022-06-11 06:48:33
|
but packing should select the next possible location (in the raster order) for each varblock
|
|
2022-06-11 06:49:07
|
and I'm not sure whether the varblock that can possibly cross group boundaries is disallowed or relocated to the next possible position
|
|
|
_wb_
|
2022-06-11 07:20:08
|
It is disallowed to cross group boundaries
|
|
2022-06-11 07:22:18
|
So no relocation, you can just read them and put them in raster order, just skipping over positions that are already covered
|
|
2022-06-11 07:24:25
|
The first not-already-covered 8x8 cell should also be a position where you can fit the (topleft corner of the) varblock and if that makes it cross a group boundary, it's an invalid bitstream
|
|
|
yurume
|
2022-06-11 07:36:20
|
in the other words each varblock is & should be encoded in the way that there is no such crossing and out-of-boundary condition in general
|
|
|
|
veluca
|
|
_wb_
|
2022-06-11 08:43:12
|
yes, i.e. the decoder doesn't need to play tetris
|
|
|
spider-mario
|
|
_wb_
"it is however a critical" wtf
|
|
2022-06-11 01:44:00
|
https://en.wiktionary.org/wiki/critical#Noun
> 1. A critical value, factor, etc.
>
> - **1976**, American Society of Mechanical Engineers, _Journal of engineering for industry_, volume 98, page 508:
> The second undamped system **criticals** show a greater percentage depression than the first.
|
|
2022-06-11 02:02:46
|
seems they donโt know much about breakdancing either then
|
|
|
|
JendaLinda
|
2022-06-11 03:24:33
|
I've noticed that in the lossy mode, JXL allows the image colors to bleed into the fully transparent pixels. It looks quite pretty, it's an interesting effect. I guess the codec does this for better compression, as in the lossy mode the contents of the fully transparent pixels are discarded anyway.
|
|
|
_wb_
|
2022-06-11 03:35:12
|
Yes, it helps to avoid artifacts caused by abrupt changes at the transparency boundary, so it's better than e.g. just making all invisible pixels black.
|
|
2022-06-11 03:37:35
|
Preserving the input color at invisible pixels is typically a bad idea since optimized pngs often have weird stuff there that happens to compress well in png, e.g. horizontal lines of the same color as the last non-invisible pixel. That's quite bad for dct compression though...
|
|
|
|
JendaLinda
|
2022-06-11 04:11:07
|
To me, this looks more like some prediction trickery rather than DCT optimization, and the same optimization is done in both VarDCT and Modular.
|
|
|
_wb_
|
2022-06-11 04:32:45
|
Only in lossy modular should it be done iirc
|
|
2022-06-11 04:35:25
|
It basically tries to turn invisible pixels into a blurry mess that should compress well with dct, and that also causes less problems when filters etc bleed invisible colors into visible ones
|
|
|
|
JendaLinda
|
2022-06-11 04:38:53
|
It's interesting that the blurring is going mostly downwards. Yes, I was talking about the lossy modular.
|
|
|
_wb_
|
2022-06-11 05:05:26
|
It's mostly downwards just to keep it a cheap thing to do in one scan of the image
|
|
|
|
falsjds
|
2022-06-12 10:06:43
|
how do i create an animated .jxl | `cjxl` compresses an image or animation to the JPEG XL format. Is there a cli option i have not found?
|
|
|
_wb_
|
2022-06-12 10:12:48
|
Use apng input
|
|
|
fab
|
|
falsjds
how do i create an animated .jxl | `cjxl` compresses an image or animation to the JPEG XL format. Is there a cli option i have not found?
|
|
2022-06-12 03:13:33
|
https://discord.com/channels/794206087879852103/804324493420920833/985536771603460106
|
|
2022-06-12 03:13:40
|
cjxlng is unstable now
|
|
2022-06-12 03:13:53
|
so you will have to wait
|
|
2022-06-12 03:14:39
|
|
|
2022-06-12 03:14:51
|
https://github.com/libjxl/libjxl/milestones
|
|
|
yurume
|
2022-06-14 05:04:46
|
today in jxsml: realized that you need to process LLF before any HF decoding (because the HF context depends on quantized LF values anyway)
|
|
|
|
veluca
|
2022-06-14 08:30:03
|
well, you need to *decode* LLF
|
|
2022-06-14 08:30:12
|
not necessarily *process*
|
|
|
_wb_
|
2022-06-14 08:55:29
|
can't really do much with HF anyway without LF...
|
|
|
yurume
|
2022-06-14 09:28:38
|
I somehow thought they can be independently decoded and later combined
|
|
|
DraX
|
2022-06-14 10:56:39
|
Hi there. I heard about jpeg xl recently and it seemed interesting. I'm going through a phase of trying to convert a lot of my lossless formats into efficiently lossy ones and soon I'll be tackling images.
So, as someone who doesn't really know much about formats or codecs, naturally I have a few questions I was hoping could be answered here:
-I have a lot of .pdf, .cbr and .cbz files. Would it be possible to convert the images in these files to jxl losslessly in the case of jpegs, (which are already lossy) and lossily (is that even a word?) In the case of .pngs etc. So they can still be read in a standard pdf/cbr reader like Zathura? Or Emacs?
-do either ffmpeg or imagemagick have the necessary tools rn to losslessly convert jpeg to jxl? Or do I still have to download and install the libjxl tools on their own?
|
|
|
Cool Doggo
|
2022-06-14 11:44:53
|
lossless itself isnt specific to jpeg, lossless transcoding is
|
|
2022-06-14 11:46:11
|
ffmpeg cannot do lossless transcoding (unless there was something updated) because it requires the actual bit data of the jpeg to do so, it will still losslessly encode it using the pixel data assuming you are using d 0. i believe its the same for imagemagick too
|
|
|
Nova Aurora
|
2022-06-14 11:56:33
|
Currently Zathura can't do jxl, lossless jpeg transcode enables the reconstruction of the exact bits of the jpeg, meaning no further loss is incurred with it.
|
|
|
Nova Aurora
Currently Zathura can't do jxl, lossless jpeg transcode enables the reconstruction of the exact bits of the jpeg, meaning no further loss is incurred with it.
|
|
2022-06-15 12:00:57
|
Someone could write a plugin for zathura and emacs, but I'm not sure how involved that would be
|
|
2022-06-15 12:14:50
|
~~I guess~~ PDF is much harder, ~~more components have to be patched (such as poppler). As long as JXL is not included in PDF standard (ISO 32000), other~~ PDF readers ~~will probably~~ have issues ~~reading JXL inside PDF~~
|
|
|
|
veluca
|
2022-06-15 05:06:50
|
every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? ๐
|
|
|
_wb_
|
2022-06-15 05:13:47
|
JXL will probably at some point be in the PDF standard (at least that's the impression I had), but I expect it to take many, many years before all pdf tools will support that version of pdf
|
|
|
veluca
every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? ๐
|
|
2022-06-15 05:15:52
|
My current hypothesis is that tv range still exists just because smaller range produces smaller files so it makes encoders look better when people do benchmarks in ill-designed ways.
|
|
|
|
veluca
|
2022-06-15 05:18:28
|
in this specific situation, this came up because of cICP in PNG apparently allowing for limited range RGB
|
|
2022-06-15 05:18:46
|
which seems... uh... an interesting choice?
|
|
|
_wb_
|
2022-06-15 06:07:39
|
Limited range _RGB_?
|
|
2022-06-15 06:09:20
|
I guess one advantage of not setting black at 0 and white at maxval, is that you can have out-of-gamut colors without clipping
|
|
2022-06-15 06:09:57
|
(then again I think it's also a nice property of uint representations that everything is by definition in gamut)
|
|
|
|
veluca
|
|
_wb_
Limited range _RGB_?
|
|
2022-06-15 07:25:11
|
yeeeah...
|
|
|
_wb_
|
2022-06-15 07:26:28
|
iirc, camera raws also tend to use a reduced range in a way, i.e. they have some non-zero black level and no pixel is ever 2^bitdepth -1
|
|
2022-06-15 07:27:40
|
but I guess that has more to do with how the sensor works than with an intentionally reduced range
|
|
|
|
veluca
|
2022-06-15 07:28:05
|
I can't think of any good reason for limited range rgb to be a thing tbh
|
|
|
_wb_
|
2022-06-15 08:49:04
|
me neither
|
|
2022-06-15 08:50:01
|
if you need room at the ends for editing or whatever, just doing limited range is not going to help, because the extra room you get with that is pretty small
|
|
2022-06-15 08:51:35
|
using float is safer for that โ float can be seen as "limited range" if 0..1 is the nominal range, and it's basically using only about 1/4th of the available range (half of the floats are negative and about half of the positive floats are > 1)
|
|
2022-06-15 08:52:34
|
the only reason I can imagine for limited range rgb is to be able to use full range ycbcr transforms on it to convert it to/from limited range ycbcr?
|
|
2022-06-15 08:54:07
|
(and I guess if the input image is in tv range ycbcr, converting it to full range rgb is slightly awkward because you basically have to spread the values in a way that cannot really be done nicely uniformly)
|
|
|
spider-mario
|
2022-06-15 08:54:09
|
as Jon points out, camera raws are a use case, although it wouldnโt really justify having it in PNG specifically
|
|
2022-06-15 08:54:15
|
(who would use PNG for raws)
|
|
|
_wb_
|
2022-06-15 08:56:01
|
in 8-bit, limited range ycbcr is [16,235] for luma and [16,240] for chroma, so I suppose that more or less leads to 220 different values per channel when converting to rgb
|
|
2022-06-15 08:57:17
|
keeping those 220 values compact is probably better for compression than spreading them out over 256 values
|
|
2022-06-15 08:58:29
|
the CAFE image is like that iirc, it has been tv range at some point and now only uses some RGB values
|
|
2022-06-15 09:02:08
|
$ ../build/tools/cjxl ClassA_8bit_CAFE_2048x2560_8b_RGB.ppm.png -d 0 -C 0 -c 1 -Y 0
JPEG XL encoder v0.7.0 a444260b [AVX2,SSE4,SSSE3,Scalar]
No output file specified.
Encoding will be performed, but the result will be discarded.
Read 1280x1600 image, 19.5 MP/s
Encoding [Modular, lossless, squirrel], 4 threads.
./lib/jxl/modular/transform/enc_palette.cc:232: Channel 0 uses only 208 colors.
./lib/jxl/modular/transform/enc_palette.cc:232: Channel 2 uses only 208 colors.
./lib/jxl/modular/transform/enc_palette.cc:232: Channel 4 uses only 208 colors.
|
|
2022-06-15 09:02:32
|
looks like it's even only 208 values per channel, I dunno how they ended up with that
|
|
2022-06-15 09:03:10
|
maybe just contrast stretching done after the image was already 8-bit
|
|
|
spider-mario
|
2022-06-15 09:16:37
|
I donโt like that image
|
|
2022-06-15 09:16:38
|
itโs ugly
|
|
2022-06-15 09:16:47
|
oversaturated, oversharpened, noisy and aliased
|
|
2022-06-15 09:17:40
|
lots of local contrast but lacking in global contrast
|
|
2022-06-15 09:17:48
|
(I hope the person who made it is not here)
|
|
|
|
veluca
|
|
spider-mario
oversaturated, oversharpened, noisy and aliased
|
|
2022-06-15 09:21:54
|
anything else? ๐คฃ
|
|
2022-06-15 09:22:17
|
but this discussion actually raises a good point: does anybody actually define limited range rgb?
|
|
2022-06-15 09:22:44
|
16-235
|
|
2022-06-15 09:22:46
|
apparently yes
|
|
|
_wb_
|
|
spider-mario
(I hope the person who made it is not here)
|
|
2022-06-15 11:01:30
|
I have no clue who made it, but it must have been a long time ago, and I think it must have been someone from Belgium (at the least the photo was taken in Belgium) but it was certainly not me
|
|
2022-06-15 11:02:08
|
And yes it looks way overprocessed
|
|
|
BlueSwordM
|
|
veluca
every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? ๐
|
|
2022-06-15 01:21:33
|
The main reasons are just standards and inertia, with a bit of film-making in the latter.
For standard stuff, it is mainly related to the fact that even today, the only consumer video standard that forces full-range is Dolby Vision Profile 5, and even then, chroma subsampling of 4:2:0 is still used.
|
|
|
_wb_
|
2022-06-15 01:24:28
|
I guess it will help if we make tools have more sensible defaults
|
|
2022-06-15 01:24:50
|
like mozjpeg doing progressive by default and 4:4:4 for higher quality settings
|
|
2022-06-15 01:25:24
|
while the normal libjpeg-turbo requires you to explicitly choose progressive and 4:4:4
|
|
2022-06-15 01:25:48
|
I think it would make sense if ffmpeg would use full range by default if it can
|
|
|
Traneptora
|
|
_wb_
I think it would make sense if ffmpeg would use full range by default if it can
|
|
2022-06-15 05:00:54
|
ffmpeg defaults to input range
|
|
2022-06-15 05:01:16
|
if it automatically converted limited range to full range it would break a lot of existing things that expect limited range
|
|
|
_wb_
|
2022-06-15 05:12:16
|
what if input is rgb? does it then default to full range or tv range yuv?
|
|
|
Traneptora
|
2022-06-15 05:12:30
|
it defaults to whatever the input range is
|
|
|
_wb_
|
2022-06-15 05:12:52
|
but e.g. to me it's not intuitive that `-pix_fmt yuv444` implies tv range
|
|
|
Traneptora
|
2022-06-15 05:13:02
|
it doesn't
|
|
|
_wb_
|
2022-06-15 05:13:13
|
you need `yuvj444` if you want full range iirc, no?
|
|
|
Traneptora
|
2022-06-15 05:13:23
|
no, those have been deprecated for over a decade and you should not use them
|
|
2022-06-15 05:13:33
|
you use `yuv444p` for both full and limited range
|
|
2022-06-15 05:13:51
|
`-color_range pc` or `-color_range tv` to tag the output file appropriately
|
|
2022-06-15 05:14:30
|
I believe when converting from RGB -> YUV, the default is full-range for RGB and limited-range for YUV
|
|
2022-06-15 05:14:52
|
but if no colormatrix is used then it defaults to whatever was provided on input
|
|
2022-06-15 05:15:21
|
the problem is virtually nothing supports full-range YUV
|
|
2022-06-15 05:15:36
|
so that's why that default exists
|
|
2022-06-15 05:16:04
|
but, if you say, had full-range yuv420p input, and ran -pix_fmt yuv444p it wouldn't automatically convert it to limited range
|
|
2022-06-15 05:16:06
|
as far as I'm aware
|
|
|
_wb_
|
2022-06-15 05:16:13
|
ah ok
|
|
|
Traneptora
|
2022-06-15 05:16:47
|
the libjxl encoder wrapper prints a warning if you attempt to pass it limited range. (at least it does if they ever merge my patch grumble grumble)
|
|
2022-06-15 05:17:06
|
it also prints a similar warning if you pass it untagged range
|
|
2022-06-15 05:17:55
|
```c
+ /* JPEG XL format itself does not support limited range */
+ if (avctx->color_range == AVCOL_RANGE_MPEG ||
+ avctx->color_range == AVCOL_RANGE_UNSPECIFIED && frame->color_range == AVCOL_RANGE_MPEG)
+ av_log(avctx, AV_LOG_ERROR, "This encoder does not support limited (tv) range, colors will be wrong!\n");
+ else if (avctx->color_range != AVCOL_RANGE_JPEG && frame->color_range != AVCOL_RANGE_JPEG)
+ av_log(avctx, AV_LOG_WARNING, "Unknown color range, assuming full (pc)\n");
```
|
|
|
_wb_
|
2022-06-15 05:18:30
|
my ffmpeg does default to tv range when encoding a bunch of png frames as input and some video codec as output
|
|
|
Traneptora
|
2022-06-15 05:18:50
|
> when converting from RGB -> YUV, the default is full-range for RGB and limited-range for YUV
|
|
2022-06-15 05:19:12
|
it won't automatically shrink it if you keep it in RGB, like if you were to encode it to, say ffv1
|
|
2022-06-15 05:19:19
|
which supports RGB
|
|
|
|
veluca
|
2022-06-15 06:37:39
|
I wish I could say I am surprised by how much confusion can be generated by things like the `colr` box
|
|
2022-06-15 06:37:51
|
which for whatever reason uses 2 bytes to send values 0-255
|
|
2022-06-15 06:38:09
|
but unfortunately I'm not surprised ๐ฆ
|
|
2022-06-15 06:38:30
|
(https://github.com/w3c/PNG-spec/issues/129#issuecomment-1156796286)
|
|
|
ziemek.z
|
2022-06-16 09:58:20
|
Been browsing FLIF and FUIF repo recently.
<@794205442175402004> FLIF is still to be completely finished, even if it takes me 10 years in my free time to squash all bugs, but shouldn't FUIF repo be archived since it's *totally* obsolete because of being included into JPEG XL?
|
|
|
_wb_
|
2022-06-16 10:20:54
|
what do you mean "be archived"? it kind of already is, it's just sitting there as an inactive repo
|
|
|
ziemek.z
|
|
_wb_
what do you mean "be archived"? it kind of already is, it's just sitting there as an inactive repo
|
|
2022-06-16 11:15:38
|
https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories
|
|
2022-06-16 01:03:14
|
FUIF, from what I understand, was totally, completely merged into JPEG XL, so IMHO there's no point in contributing to it. The whole codebase has been moved. FLIF is a different thing. A foundation for FUIF, but not its direct predecessor ("parent").
|
|
|
Jyrki Alakuijala
|
2022-06-17 02:59:21
|
I may be mistaken, but my impression is that FLIF has a decoding speed issue -- to be practical it would need a new approach to decoding speed
|
|
2022-06-17 02:59:48
|
that new approach is partially available in JPEG XL -- JPEG XL needs some encoding/decoding refinements for the best speed, too
|
|
2022-06-17 03:00:16
|
but great decoding speed in general is easier in JPEG XL, because of entropy clustering is used rather than updating probabilities
|
|
|
Traneptora
|
2022-06-17 09:40:13
|
Yea FLIF is impractically slow
|
|
2022-06-17 09:40:30
|
significantly slower than ffv1, for context
|
|
2022-06-17 09:40:39
|
without significant improvement
|
|
2022-06-17 09:41:12
|
in bpp that is
|
|
2022-06-17 09:41:39
|
That said, lossless JXL is not particularly fast
|
|
2022-06-17 09:41:53
|
it's much slower to decode than PNG
|
|
|
_wb_
|
2022-06-17 10:20:20
|
well png is basically just gunzip
|
|
|
Traneptora
|
|
_wb_
well png is basically just gunzip
|
|
2022-06-18 03:07:14
|
which is weirdly slow to decode tbf
|
|
2022-06-18 03:07:24
|
but maybe that's just due to the quantity of data
|
|
|
BlueSwordM
|
|
Traneptora
but maybe that's just due to the quantity of data
|
|
2022-06-18 03:33:09
|
That is mainly it.
Entropy coders get tested the most at high data rates.
|
|
|
_wb_
|
2022-06-18 05:40:56
|
You can put a truncated jpeg in a zip file
|
|
2022-06-18 05:41:48
|
It's probably a progressive jpeg that has part of its last scan cut off
|
|
2022-06-18 05:42:12
|
The bottom probably looks slightly worse than the top, if you look carefully
|
|
2022-06-18 05:43:45
|
iirc, you can use jpegtran to turn it into a syntactically correct jpeg
|
|
2022-06-18 05:49:38
|
Ah, probably it is. What does `djpeg -verbose -verbose image.jpg >/dev/null` say?
|
|
2022-06-18 05:50:00
|
Could be that it just has its end marker missing or something
|
|
2022-06-18 06:10:56
|
And you're sure the bottom right of the image looks fine?
|
|
2022-06-18 06:11:20
|
Could be the only thing that is missing is the end marker itself
|
|
2022-06-18 06:11:54
|
Does `jpegtran image.jpg > fixed.jpg` work?
|
|
2022-06-18 06:20:53
|
Yes, but not much data missing, it's indeed not a progressive one so it's just that last row of blocks
|
|
2022-06-18 06:21:33
|
You can use jpegtran to losslessly crop away that last row, and then recompress that
|
|
2022-06-18 06:28:17
|
Yeah that command will not crop, just turn it into something jxl can recompress
|
|
|
The_Decryptor
|
2022-06-18 06:33:12
|
You need to supply `-copy all` to keep metadata
|
|
|
yurume
|
2022-06-18 07:01:23
|
today in jxsml: I've done implementing DCT (both forward and inverse), and then caught up with a series of refactorings
|
|
2022-06-18 07:02:35
|
I think I've revised `jxsml__lf_group_t` (a data structure for parallel HF decoding) at least 10 times
|
|
|
|
veluca
|
|
yurume
|
2022-06-18 07:26:35
|
for example I initially didn't preserve LfQuant and directly converted it into LLF coefficients and threw it away
|
|
2022-06-18 07:26:57
|
but I realized that I need to preserve it for coefficient context modelling
|
|
2022-06-18 07:27:42
|
and then I realized that I don't actually have to preserve LfQuant, only the final LF context index [0, 64) is relevant
|
|
2022-06-18 07:28:46
|
so that part of code has been updated twice with complete rewrite :p
|
|
2022-06-18 07:29:38
|
at first I wondered why `lf_index` is a separate variable in the pseudocode, now I know why
|
|
2022-06-18 07:29:48
|
since it is much easier to vectorize in that way
|
|
|
_wb_
|
2022-06-18 07:30:08
|
It could be useful to add an implementation note to the spec that says something about it
|
|
|
yurume
|
2022-06-18 07:30:38
|
(I was deeply concerned about the block context computation efficiency, so I looked for a way to optimize that and came to the realization)
|
|
|
_wb_
|
2022-06-18 07:37:22
|
I remember adding lf context to hf. IIRC, I added that to make jpeg recompression more competitive with brunsli โ brunsli has other hf ctx that we can't really do in jxl since it assumes all blocks are 8x8, so I needed something else to make vardct jxl as good as brunsli so we could get rid of brunsli-in-jxl
|
|
2022-06-18 07:38:47
|
It was a tricky balance to do it in a way that doesn't have a decode speed cost but still gives some compression density advantage
|
|
|
yurume
|
2022-06-19 10:32:04
|
today in jxsml: I've finally reached the point where I successfully decoded all LF and HF coefficients (yet to be processed or permuted, though)
|
|
2022-06-19 10:32:31
|
...only after 4500 lines of code ๐
|
|
2022-06-19 10:32:35
|
the gist has been updated: https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f
|
|
|
|
plantain
|
2022-06-20 05:37:55
|
Hi, I am trying JPEG-XL via GDAL, where the only knobs available to control quality seem to be JXL_EFFORT and JXL_DISTANCE. Is this 'standard' JPEG-XL language? How does it compare to JPEG/WebP with a sliding quality=0-100?
|
|
|
yurume
|
2022-06-20 05:43:15
|
JXL_DISTANCE corresponds to the Butteraugli distance target, which is a perceptual similarity metric and much more defined than a simple "quality" factor
|
|
2022-06-20 05:45:28
|
for example q90 in typical JPEG can result in quite varying degree of actual quality, but d1.0 in JPEG XL results in more or less a compressed image which Butteraugli distance is hopefully close to 1.0
|
|
2022-06-20 05:48:24
|
larger JXL_EFFORT generally improves a distance accuracy and in many (but not all) cases also results in a smaller file due to the increased number of knobs
|
|
|
|
plantain
|
2022-06-20 05:51:19
|
and am I correct in understanding distance=1 should be visually lossless?
|
|
|
yurume
|
2022-06-20 05:51:39
|
there is an approximate relation between JPEG quality and Butteraugli distance (e.g. cjxl maps -q30..100 into -d6.4..0.1 linearly) but there's a considerable variation due to those reasons
|
|
2022-06-20 05:52:04
|
it depends especially for image types
|
|
2022-06-20 05:52:17
|
I've heard that 1.5 might be fine
|
|
2022-06-20 05:53:04
|
on the other hands, back in time (i.e. might be invalid now) I remember using -d1.0 and found some visible loss *when zoomed in*
|
|
2022-06-20 05:54:26
|
I think the current encoder assumes a particular (but configurable) viewing distance, which might or might not be okay
|
|
|
|
plantain
|
2022-06-20 05:54:45
|
ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
|
|
|
yurume
|
2022-06-20 05:55:42
|
at that size and usage I think you need to check the zoomed-in result, cjxl does have a ton of configurations that might be helpful (including the aforementioned viewing distance) but GDAL doesn't seem to have any
|
|
|
|
plantain
|
2022-06-20 05:56:37
|
it does seem to be much slower to encode, I think single threaded in GDAL as well, which might be a catch
|
|
|
yurume
|
2022-06-20 05:57:27
|
indeed, it doesn't have a JxlEncoderSetParallelRunner call?
|
|
|
|
plantain
|
2022-06-20 05:58:20
|
it does appear to have it in the source... I'll keep digging
|
|
2022-06-20 05:58:55
|
ah, only in the explicit JPEGXL driver, not the GeoTIFF driver
|
|
|
yurume
|
2022-06-20 05:59:08
|
huh weird, what's a difference between https://github.com/OSGeo/gdal/blob/master/frmts/gtiff/tif_jxl.c and https://github.com/OSGeo/gdal/blob/master/frmts/jpegxl/jpegxl.cpp
|
|
|
|
plantain
|
2022-06-20 06:00:03
|
the former is for embedding inside GeoTIFF's
|
|
|
monad
|
|
plantain
and am I correct in understanding distance=1 should be visually lossless?
|
|
2022-06-20 06:46:51
|
d1 targets near-visually-lossless at 1000 pixels viewing distance. At worst this should allow a slight difference only noticeable with a flip test.
|
|
|
|
plantain
|
2022-06-20 07:00:45
|
I don't really understand the concept of viewing distance
|
|
2022-06-20 07:01:21
|
1000 pixels as in the same pixel size as the screen, but in the Z axis towards the viewer?
|
|
2022-06-20 07:01:57
|
so if my DPI is 100px/cm, d1 is near-visually-lossless at 10cm?
|
|
|
yurume
|
2022-06-20 07:27:23
|
I think so, provided that the device pixel corresponds to the image pixel (which is often not, especially in mobiles)
|
|
2022-06-20 07:31:49
|
there is also an assumption of the display brightness, which is by default 255 cd/m^2 for non-HDR images (for the reference, both my laptop and monitors have about 300--330 cd/m^2 brightness)
|
|
|
The_Decryptor
|
2022-06-20 07:51:14
|
I think the "standard" for sRGB is supposed to be 80 nits, but my monitor is over twice that and it can still appear dim sometimes
|
|
|
novomesk
|
|
plantain
ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
|
|
2022-06-20 10:12:26
|
In majority of my software I have artificial limit to reject too big images. It is currently 256 megapixel.
|
|
|
|
plantain
|
2022-06-20 10:14:45
|
I usually hit the real limit of 4GB of RAM long before that ๐ฅฒ
|
|
|
novomesk
|
2022-06-20 11:26:39
|
Viewing of JXL (but also AVIF) files can be at least 2 times faster in gwenview. There is a need to refactor a portion of the gwenview's code. It is not a trivial change, it must be done in a way so there woun't be performance penalty for older formats.
|
|
|
diskorduser
|
2022-06-20 11:42:20
|
So, gtk based image viewers open jxl faster than gwenview?
|
|
|
novomesk
|
|
diskorduser
So, gtk based image viewers open jxl faster than gwenview?
|
|
2022-06-20 12:01:01
|
Cannot say yes or no. It depends on every implementation and loaded image. For example someone loads image as array of FLOAT, consuming 4 times more memory than 8bit array. Copying large bitmaps here and there take some time too. So even if it is not as fast as it could be, there could be slower viewers too.
|
|
2022-06-20 12:20:16
|
https://invent.kde.org/graphics/gwenview/-/issues/3
|
|
|
Traneptora
|
2022-06-20 02:13:19
|
I've had a patch sitting on the ML for two weeks that fixes some bugs, maybe related to that?
|
|
|
_wb_
|
|
plantain
ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
|
|
2022-06-20 03:24:20
|
I would try e7 or even e6
|
|
|
fab
|
|
plantain
ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
|
|
2022-06-20 03:30:53
|
Jxl average is 37% smaller at s7 s8
|
|
2022-06-20 03:31:10
|
The point is to be transparent on a big variety of images without slowing the decoding
|
|
2022-06-20 03:31:25
|
Not to discard too much of dct
|
|
|
novomesk
|
|
plantain
I usually hit the real limit of 4GB of RAM long before that ๐ฅฒ
|
|
2022-06-20 05:38:02
|
In my Qt JXL plugin I have limit 64 megapixels when running on 32bit machine. I was afraid that libjxl will run out of memory and abort() whole application.
|
|
|
yurume
|
2022-06-22 09:29:45
|
today in jxsml: I've finished all the necessary ingredients to render vardct (coefficient order, dequantization matrix, actual coefficients, chroma from luma, hastily bodged inverse XYB transform) and the result is not perfect
|
|
|
|
veluca
|
2022-06-22 09:33:22
|
indeed ๐
|
|
|
yurume
|
2022-06-22 09:34:09
|
to be fair I never tested most components making up vardct until this point
|
|
|
|
veluca
|
2022-06-22 09:35:27
|
yeah I thought so
|
|
|
yurume
|
2022-06-22 09:36:56
|
I could successfully decode a particular vardct image at this point though, so only the postprocessing is problematic (at least for that image)
|
|
|
|
veluca
|
2022-06-22 09:38:01
|
define "postprocessing" ๐
|
|
|
yurume
|
2022-06-22 09:44:10
|
not every sure lol
|
|
2022-06-22 09:46:26
|
I believe that's roughly HF dequantization, CfL followed by inverse XYB (I do LF dequantization much earlier)
|
|
|
|
veluca
|
2022-06-22 09:47:11
|
so from the image I unfortunately cannot pinpoint you to anything that I can confidently say to be the root cause
|
|
|
yurume
|
2022-06-22 09:47:24
|
if my observation is correct, raw XYB samples are actually quite large (the order of ten thousands)?
|
|
|
|
veluca
|
2022-06-22 09:47:29
|
(believe me, I messed up vardct decoding so many times that I have generally good guesses :P)
|
|
2022-06-22 09:47:35
|
uhhhhh... not at all?
|
|
2022-06-22 09:47:50
|
Y and B are ~0-1
|
|
2022-06-22 09:48:06
|
X is... super tiny, like -1/32 to 1/32
|
|
|
yurume
|
2022-06-22 09:48:06
|
huh
|
|
2022-06-22 09:48:45
|
I think then I have messed the dequantization up, probably by multiplying instead of dividing
|
|
|
|
veluca
|
2022-06-22 09:48:55
|
have you tried on --speed falcon images?
|
|
2022-06-22 09:49:01
|
those have only 8x8s
|
|
|
yurume
|
2022-06-22 09:50:29
|
good to know that
|
|
2022-06-22 09:54:33
|
not sure it helps
|
|
2022-06-22 09:57:24
|
should dumping XYB as if it's RGB (after some scaling and offsetting) result in something recognizable?
|
|
2022-06-22 10:00:01
|
okay, this is very suspicious (RGB = 128 * XYB + 128, saturated)
|
|
2022-06-22 10:22:57
|
it turns out that I've computed the *inverse* of natural order as well
|
|
2022-06-22 10:27:52
|
I'm also kinda sure that the DCT doesn't work at all, given the skewed appearance of each 8x8 block and a particular glitch in the middle row
|
|
2022-06-22 10:28:41
|
which is pretty noticable when the original image is overlaid
|
|
2022-06-22 10:59:00
|
haha, I realized that they are not strictly DCT8x8 but also includes DCT2x2 etc, which are intentionally left out for testing
|
|
|
|
veluca
|
2022-06-22 11:46:04
|
uhm no falcon should only be 8x8
|
|
|
yurume
it turns out that I've computed the *inverse* of natural order as well
|
|
2022-06-22 11:46:18
|
yeah that always gets me
|
|
2022-06-22 11:47:08
|
that image looks *rather* suspicious
|
|
|
yurume
|
2022-06-23 08:55:23
|
almost sure that the LF dequantization factor is also actually its reciprocal, the spec seems wrong
|
|
|
|
veluca
|
2022-06-23 11:02:57
|
entirely possible
|
|
|
yurume
|
2022-06-23 04:40:53
|
I've checked the entire code path towards the LF dequant factor, which is according to the spec `m_#_lf_unscaled / (global_scale * quant_lf) * 2^-extra_precision`
|
|
2022-06-23 04:41:04
|
but it actually seems to be something like `(global_scale * quant_dc) / 2^9 / m_#_lf_unscaled * 2^-extra_precision`
|
|
|
_wb_
|
2022-06-23 04:44:20
|
So it's not just the inverse but also 512 times too small?
|
|
|
yurume
|
2022-06-23 04:44:26
|
yeah
|
|
2022-06-23 04:45:26
|
so there are three major differences: `m_#_lf_unscaled` is scaled by 1/128 right after decoding, the global scale is further divided by 2^16, and finally the multiplier is its inverse (before extra precision)
|
|
|
_wb_
|
2022-06-23 05:00:34
|
Oh boy. I wish we could have called the first edition edition 0.1 ๐
|
|
|
|
veluca
|
2022-06-23 05:17:50
|
heh not quite right xD
|
|
|
yurume
|
2022-06-23 05:39:30
|
great, I've found yet another missing piece of puzzle---I forgot to reshuffle inverse-transformed samples back to rectangular grids ๐
|
|
2022-06-23 05:42:28
|
so jxsml stores coefficients in the varblock order, which gets IDCTed to samples but still in the varblock order before actual reshuffling
|
|
2022-06-23 05:43:02
|
I think I was aware of this when I designed the original pipeline but forgot to do the final shuffling due to a large amount of tasks I had to tackle
|
|
2022-06-23 07:21:19
|
this does look like something
|
|
2022-06-23 07:21:41
|
(for the reference, it uses one epf iteration so I guess the blocky appearance is actually correct)
|
|
2022-06-23 07:22:49
|
this is still false color because its ranges still seem wildly out of place
|
|
2022-06-23 07:23:56
|
I have a range of `(min 0.000000 -14.421546 -9.860505 max 30.730759 8.456084 18.513645)` for XYB samples
|
|
|
_wb_
|
2022-06-23 07:33:00
|
X is supposed to be signed and close to zero, something like -0.01 to +0.01
|
|
2022-06-23 07:33:18
|
Y is supposed to have a 0..1 range or so
|
|
2022-06-23 07:33:23
|
B too
|
|
|
yurume
|
2022-06-23 07:33:25
|
yeah, there still seems some missing multipliers
|
|
|
_wb_
|
2022-06-23 07:33:55
|
Also the order of XYB could be wrong in the spec
|
|
|
yurume
|
2022-06-23 07:33:55
|
but somehow it does produce a recognizable image, so it is probably *only* a linear relation missing by now?
|
|
2022-06-23 07:34:17
|
that was REALLY confusing!
|
|
2022-06-23 07:34:20
|
XYB vs. YXB :p
|
|
|
_wb_
|
2022-06-23 07:35:22
|
Yes, we should have just done everything in YXB order all the time, it's confusing also in libjxl imo
|
|
2022-06-23 07:36:52
|
The blockiness makes me think this is mostly dc only, something still wrong with ac I think
|
|
2022-06-23 07:37:20
|
Getting closer to a decoded image though
|
|
|
yurume
|
2022-06-23 07:38:22
|
for the reference, this is the actual image encoded to and decoded back from jxl (-d1.0 -e3)
|
|
|
_wb_
|
2022-06-23 07:58:37
|
Are you doing the chroma from luma thing for both lf and hf?
|
|
|
yurume
|
2022-06-23 08:04:31
|
I believe so
|
|
|
_wb_
|
2022-06-23 08:19:56
|
It looks like X and B have too high amplitudes in that image
|
|
|
yurume
|
2022-06-23 09:06:21
|
I'm very suspicious of LF dequant factors again, they have a range of `(min 0.000000 -13.543465 -5.319397 max 30.656525 6.859222 17.358032)` which is not much different from all other samples
|
|
|
_wb_
|
2022-06-23 09:22:56
|
Is that YXB order?
|
|
|
yurume
|
2022-06-23 09:23:54
|
no, XYB
|
|
2022-06-23 09:24:22
|
X minimum might be larger than 0 because I have intentionally capped it to zero for the inspection purpose
|
|
2022-06-23 09:25:55
|
the following was my reasoning:
```
how LF gets quantized?
(1) G.1.2: m_#_lf_unscaled are read; defaults are 4096, 512 & 256
(2) I.2.1: per-frame dequantization factors m#DC = m_#_lf_unscaled / (global_scope * quant_lf)
(3) G.2.2: extra_precision is read for each LF group
(4) I.4.2: dequantized coefficients d# = m#DC * (quantized coefficient) / (1 << extra_precision)
in libjxl:
(1) DequantMatrices::DecodeDC:
- reads dc_quant_[0..2] and MULTIPLIES THEM BY 1/128
- inv_dc_quant_[0..2] = 1 / dc_quant_[0..2]
(2) Quantizer::Decode:
- read global_scale_ and quant_dc_
- RecomputeFromGlobalScale:
- global_scale_float_ = global_scale_ / 2^16
- inv_global_scale_ = 2^16 / global_scale_
- inv_quant_dc_ = inv_global_scale_ / quant_dc_ = 2^16 / (global_scale_ * quant_dc_)
- mul_dc_[0..2] = GetDcStep(0..2)
= inv_quant_dc_ * dc_quant_[0..2]
= (2^16 / (global_scale_ * quant_dc_)) * dc_quant_[0..2]
= 2^16 * dc_quant_[0..2] / (global_scale_ * quant_dc_)
- inv_mul_dc_[0..2] = GetInvDcStep(0..2)
= inv_dc_quant_[0..2] * (global_scale_float_ * quant_dc_)
= (1 / dc_quant_[0..2]) * (global_scale_ / 2^16) * quant_dc_
= (global_scale_ * quant_dc_) / 2^16 / dc_quant_[0..2]
(3) ModularFrameDecoder::DecodeVarDCTDC:
- read extra_precision
(4) DequantDC, called from DecodeVarDCTDC:
- dc_factors[0..2] = mul_dc_[0..2]
- mul = 2^-extra_precision
- multiply each row with dc_factors[0..2] * mul
= 2^16 * dc_quant_[0..2] / (global_scale_ * quant_dc_) * 2^-extra_precision
the original factor is supposed to be m_#_lf_unscaled / (global_scale * quant_lf) * 2^-extra_precision
in reality it's (m_#_lf_unscaled / 128) / (global_scale * quant_dc) * 2^(16-extra_precision)
```
|
|
2022-06-23 09:26:20
|
I have checked again and couldn't find any missing piece
|
|
2022-06-23 10:03:09
|
okay, 1/128 multiplier for `dc_quant_` only applies when all_default is false; the default value is already multiplied
|
|
2022-06-23 10:16:50
|
DequantDC also receives `mul_dc_` instead of `inv_mul_dc_`, so hopefully it's just a (massive) scaling issue...?
|
|
2022-06-23 10:18:30
|
but somehow I'm still unable to reproduce the resulting parameters from inputs, my computed values are 2^24 times actual values---how?
|
|
2022-06-23 10:22:07
|
okay, the default values for `m_#_lf_unscaled` are *not* {4096, 512, 256}, they are {1/4096, 1/512, 1/256} and if all\_default is false three values p[0..2] are read and `m_#_lf_unscaled` should be {p[0]/128, ...}
|
|
2022-06-23 10:22:36
|
so that was a big confusion; both default values and scaling factors are wrong, in the different direction
|
|
|
|
veluca
|
2022-06-23 10:32:17
|
I'm very sorry xD
|
|
|
yurume
|
2022-06-23 10:43:50
|
well it really shows that libjxl was designed when the format itself was not yet finalized ๐
|
|
2022-06-23 11:17:14
|
looks a bit more reasonable (but no, it's again a false color by reinterpreting XYB as RGB and remapping min..max to 0..255)
|
|
2022-06-25 09:38:10
|
today in jxsml: a tiny difference in the trace output is driving me mad :S
|
|
|
|
veluca
|
2022-06-25 09:40:04
|
looks like DC channels are swapped?
|
|
2022-06-25 09:40:55
|
mh not just that
|
|
|
yurume
|
2022-06-25 09:40:58
|
not only that, but I have a hard time understanding why the apparent CfL happens at that position
|
|
2022-06-25 09:41:39
|
and one DC coefficient seems completely off (0.009588 vs. 0.005327)
|
|