JPEG XL

2022-05-26 08:13:31	Unless you want to do patches or splines or noise first
2022-05-26 08:14:13	Probably some jxl art should already decode at this point :)

yurume

2022-05-26 08:14:54	oh that's a good idea
2022-05-26 08:21:25	by the way after implementing MA prediction I was quite worried about the decoding performance, as the tree can naturally wreck optimizations
2022-05-26 08:21:41	I then learned that libjxl has lots of special cases for MAANS decoder 😱

_wb_

2022-05-26 08:23:31	Yeah
2022-05-26 08:23:53	We do try to avoid branching

yurume

2022-05-26 08:24:55

is a large-sized MA tree frequent? I have seen one with >3000 nodes, can't recall which image had them though

_wb_

2022-05-26 08:27:46	Typically it shouldn't be too large after taking out the "static" properties (stream id and channel id)
2022-05-26 08:29:47	Though I dunno, maybe cjxl -e 9 -E 3 does sometimes make large trees

yurume

2022-05-26 08:30:02

thinking about it, that 3000-plus-node MA tree might be mistakenly decoded as well

veluca

2022-05-26 11:52:40	that seems likely
2022-05-26 11:53:03	unless the image was really large

Fox Wizard

2022-05-27 11:59:43	Why not to always trust experts: https://www.funda.nl/koop/eindhoven/huis-42721840-joris-minnestraat-17/#foto-35
2022-05-28 12:00:00	I can see ringing and over sharpening without any zoom <:kekw:910212968405430273>
2022-05-28 12:00:47	Think someone cranked up "texture" and "sharpening" in Lightroom and called it a day...
2022-05-28 12:01:41	Or cranked up "texture" and "bicubic sharper" for scaling which tends to look like shit for downscaling. Causes aliasing and cranking up texture too high can make things look unnatural

yurume

2022-05-28 07:09:33	today in jxsml: extra channel decoding is done, though upsampling is a whole different beast and I rather wouldn't touch it right now (I'm not even sure the relative order of 4 different upsampling parameters)
2022-05-28 07:10:36	and then I've tried to decode the classic _Tropical Island Sunset with JXL logo overlay_, which I thought a good way to test different bit depths (this one uses 10bpp for example)
2022-05-28 07:11:07	and somehow the MA tree decodes correctly but the subsequent Huffman tree doesn't decode, what
2022-05-28 07:12:20	not publicly announced, but you can see the current revision from https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f (which is a highly unsafe and untested C code)
2022-05-28 07:16:16	I think you need to remove `inline` from the entire code, which is why I have a (commented) line `#define inline`
2022-05-28 07:17:07	something I should actually fix when I make it into a library form (for now it's just... a mix of library code and state dumper)
2022-05-28 07:18:02	in the actual development I always attach gdb to jxsml, so that I can break right at the moment `JXSML__ERR` is used
2022-05-28 07:19:06	I think it can now handle non-palettized fjxl outputs
2022-05-28 07:19:40	it crucially lacks VarDCT support, which is likely why you've seen a failure

spider-mario

2022-05-28 08:42:58

2meirl4meirl

yurume

2022-05-28 09:14:45	fun with uiCA
2022-05-28 09:15:53	to be honest at this stage assembly-level optimization is useless, but I do care about throughput of autovectorized code
2022-05-28 09:16:47	specificially I was comparing `p1 += ((twice_pixel_t) p0 + (twice_pixel_t) p2) >> 1;` vs `p1 += ((p0 ^ p2) >> 1) + (p0 & p2);` (a part of RCT)
2022-05-28 09:28:05	if I interpreted numbers correctly, the latter achieves at least 3x throughout most recent enough microarchitectures

_wb_

2022-05-28 09:39:58

If you find improvements there, feel free to also open a PR for libjxl :)

yurume

2022-05-28 09:41:15	I don't think I can compete with Highway 😉
2022-05-28 09:42:04	I wonder if Highway emits a dedicated average operation (PAVGW etc. in x86-64), which no compiler seems to detect and generate
2022-05-28 09:42:46	or maybe compilers are smart enough to know that PAVGW is rounding up and rounded-down average is better done with other operations
2022-05-28 09:45:19	wait, actually... libjxl does all the computation in int16_t (that is, it is equivalent to `p1 += (p0 + p2) >> 1;` and `p0 + p2` can overflow)
2022-05-28 09:45:50	if that's allowed my job is much simpler, but that doesn't seem to match the spec then :p
2022-05-28 09:47:01	(of course will file an issue about this, but it's time to have a dinner)
2022-05-28 10:07:16	Yes, provided that I compiled it with a correct target (I've used `-march=haswell/rocketlake` for quick testing)

_wb_

2022-05-28 10:17:21

Wait what is it doing in int16? I don't think it's doing anything modular in int16 atm

yurume

2022-05-28 10:18:06

It was definitely specialized to the pixel type, but I haven't looked at the caller

_wb_

2022-05-28 10:18:52

Pixel_type is always int32_t in libjxl atm

yurume

2022-05-28 10:19:19

Aha

_wb_

2022-05-28 10:20:06

We should in theory be able to template it and use int16_t in case modular_16bit is true, but probably that will break stuff atm

veluca

2022-05-28 10:21:12	"probably"
2022-05-28 10:21:22	seems optimistic
	yurume specificially I was comparing `p1 += ((twice_pixel_t) p0 + (twice_pixel_t) p2) >> 1;` vs `p1 += ((p0 ^ p2) >> 1) + (p0 & p2);` (a part of RCT)
2022-05-28 10:22:22	wait why are those equivalent?

yurume

2022-05-28 10:23:28	Can be shown by dissecting the addition into full adders
2022-05-28 10:24:17	So for a single bit a and b a+b is (a xor b) + (a and b) * 2
2022-05-28 10:25:27	Since each bit is independent to each other and this alternative expression is linear...

veluca

2022-05-28 10:38:35	ah somehow I read that last part as p0 & p2 & 1
2022-05-28 10:38:49	then yes xD

yurume

2022-05-28 10:39:38

Off-topic: if you haven't read https://devblogs.microsoft.com/oldnewthing/20220207-00/?p=106223 it will be a fun read

veluca

2022-05-28 10:41:10	heh nice
2022-05-28 10:41:28	not sure what the spec says that should do tbh wrt overflows
2022-05-28 10:41:41	and in particular whether we specify anything for intermediate quantities

yurume

2022-05-28 10:44:16	It has a mention that any stored pixel during the transformation fits in 16 or 32 bits while intermediate results can be 64 bits long
2022-05-28 10:45:05	And I interpreted that (p0 + p2) is an intermediate result, not stored

_wb_

2022-05-28 10:54:22	"Stored" is anything that goes into a buffer in a (naive/generic) implementation that first puts decoded values in a buffer, then undoes each modular transform one by one and stores the results in buffers, until the final output which is also stored in a buffer. Stored pixels fit in int32 in general and in int16 in case of modular_16bit
2022-05-28 10:55:34	The intermediate values you get while computing a predictor or evaluating RCT/palette expressions could require more bits (but not more than int64, I suppose)

yurume

	_wb_ "Stored" is anything that goes into a buffer in a (naive/generic) implementation that first puts decoded values in a buffer, then undoes each modular transform one by one and stores the results in buffers, until the final output which is also stored in a buffer. Stored pixels fit in int32 in general and in int16 in case of modular_16bit
2022-05-28 11:23:01	I think there is an ambiguity in whether results from each squeeze step are stored in (temporary) buffers or not
2022-05-28 11:23:27	anyway, otherwise that was my understanding

_wb_

2022-05-28 11:24:03

I would say they are

yurume

2022-05-28 11:24:54

and technically speaking the current implementation of RCT will overflow in very rare cases where modular_16bit_buffers is false and somehow buffers have very large numbers (>2^30) in magnitude

_wb_

2022-05-28 11:25:12

Not sure if it makes a difference. Output bitdepth of squeeze should be 1 bit lower than input bitdepth

yurume

2022-05-28 11:27:11

(I made it clear that the previous message was about RCT, not Squeeze)

veluca

2022-05-28 11:29:18	ugh, I guess we should fix that...
2022-05-28 11:29:43	we should have said that nothing above 24 bits is guaranteed to work xD

yurume

	yurume and technically speaking the current implementation of RCT will overflow in very rare cases where modular_16bit_buffers is false and somehow buffers have very large numbers (>2^30) in magnitude
2022-05-28 11:30:37	I was even thinking about having 4 different implementations, where the first one will be 16bit + no overflow, the second one is 16bit + overflow and so on (my code already uses different code paths for 16bit and 32bit buffers so it's doable)

_wb_

2022-05-28 11:43:47

The only code path where libjxl currently uses modular with big numbers is when doing lossless float, where I think it avoids RCTs and squeeze just to be sure.

yurume

2022-06-01 12:22:30

today in jxsml: weighted predictor is hard

_wb_

2022-06-01 05:22:16

Yeah it's pretty hairy. What is that image?

yurume

2022-06-01 10:22:03	as noted before, _Tropical Island Sunset with JXL logo overlay_ (the first frame)
2022-06-01 10:22:48	I've tracked it down to the possibly wrong alias table in the ANS decoder, which resulted in a wrong MA tree

_wb_

2022-06-01 10:39:53

Dumping the MA tree could help

yurume

2022-06-01 10:48:19	I've dumped all changes to the ANS state during the MA tree encoding and pinpointed the exact moment where it went wrong, but I'm yet to know why
2022-06-01 10:49:58	the finest printf debugging 😉
2022-06-01 12:01:49	okay, this was a culprit: ``` correct: [alias1:(0: 0 5 0)(1: 0 5 64)(2: 0 5 128)(3: 0 5 192)(4: 0 5 256)(5: 0 5 320)(6: 0 5 384)... wrong: [alias1:(0: 0 5 64)(1: 0 5 128)(2: 0 5 192)(3: 0 5 256)(4: 0 5 320)(5: 0 5 0)(6: 0 5 384)... ```
2022-06-01 12:02:01	I thought I no longer have to debug the alias table but I was wrong :S
2022-06-01 12:11:15	aaaand it was entirely my mistake. the distribution has a single symbol at 5 and my faulty code was supposed to special-case it but it didn't thanks to a missing `+ 1`.
2022-06-01 12:12:09	(it seemed to work because in many cases that single symbol happened to be at 0, in which case the special case and normal case coincide)
2022-06-05 03:55:46	today in jxsml: weighted predictor is way hard
2022-06-05 03:56:38	there are at least 6 typos in the spec that can corrupt the final output, and it seems there are more

monad

2022-06-05 05:44:37

Those are some powerful typos.

veluca

	yurume there are at least 6 typos in the spec that can corrupt the final output, and it seems there are more
2022-06-05 09:43:06	not surprised
2022-06-05 09:43:09	fancy though

yurume

2022-06-05 01:39:51	actually this is much closer to the original (before cropping) than ever before
2022-06-05 01:40:10	the left half is more or less correct except for corrupted color; the right half is incorrect

_wb_

2022-06-05 04:13:30

Aren't both groups ok here?

yurume

2022-06-05 07:57:24	I should have made clear that the second image I posted is the reference image, not one jxsml produced
2022-06-05 07:58:04	it seems to do with stream indices where jxsml assigns 4 and 5 but apparently the image expects ~~20 and 21~~ 21 and 22

_wb_

2022-06-05 08:03:33	Ah
2022-06-05 08:03:41	Yes that numbering is funky
2022-06-05 08:04:10	There are numbers for all raw quant tables regardless of whether they are used or not

yurume

2022-06-05 08:04:24

ugh, another spec bug then

_wb_

2022-06-05 08:04:32	Spec probably needs to make that clearer
2022-06-05 08:07:11	Spec says "number of tables" but it should probably just say whatever that number is, since it is a fixed number
2022-06-05 08:09:02	It's the DctSelect index, which has range 0 to 16
2022-06-05 08:10:24	So I think the number should be 17, but then you still have an off by one?

yurume

2022-06-05 08:10:56	are those stream indices actually used or just reserved? I don't have a full understanding of vardct but I couldn't find anything that actually refer to those indices from the spec.
2022-06-05 08:12:23	if there were actually used, I suppose there should be a reference to `num_lf_groups` somewhere else since those indices would start from `1 + 3 * num_lf_groups`
2022-06-05 08:15:47	okay, I was off by one, the image actually refers to 21 and 22, so the number is fine
2022-06-05 08:17:03	I see no reference to `ModularStreamId::QuantTable` from libjxl, so I guess they are just reserved now

_wb_

2022-06-05 08:23:54

They are only used when raw quant tables are signalled

yurume

2022-06-05 08:24:34	ah indeed
2022-06-05 08:24:53	I casually looked for references to the section H but missed one buried in a pseudocode

_wb_

2022-06-05 08:27:58

So spec might be correct here, but certainly could be clearer

yurume

2022-06-05 08:28:21	how on earth does `ModularStreamId::QuantTable` actually exist in libjxl but Github search failed to find it
2022-06-05 08:31:54	anyway, updated jxsml output:
2022-06-05 08:32:19	it looks like just a faulty YCgCo conversion now?
2022-06-05 08:38:07	okay, YCgCo is okay but the sample had to be clamped (as per kRelative intent)! this is still slightly different from the reference image (bottom right) but gets almost everything right.
2022-06-05 08:39:21	this is a difference map; judging from this it seems that the border condition is still slightly off

_wb_

2022-06-05 09:29:12

I haven't looked yet at the recent spec issues you opened but I think it's likely that the weighted predictor has a couple of spec bugs/typos in it. It's a rather funky thing that started with Alexander Ratushnyak's code which was an almost unreadable bag of C macros that happened to do something. And then Luca and me made it even more complicated in an effort to make it more generic (params instead of fixed constants; the original version had fixed constants for uint8 and for uint16 but couldn't do any other bitdepth) and to fit it into modular in a way that made sense.

2022-06-05 09:30:36

It's a very powerful predictor for photographic images, but it's also a couple of magnitudes more complicated than any other predictor

yurume

2022-06-05 09:32:14	I'm most surprised by the fact that weights themselves are not a part of the state
2022-06-05 09:33:19	conceptually it combines 4 subpredictors by weights that are learnt from the past, but weights are calculated from past errors and not retained

_wb_

2022-06-05 09:34:15

Yes, I think that helps to make it follow local error patterns better

yurume

2022-06-05 09:34:23

yeah, that was my guess

_wb_

2022-06-05 09:34:40

But it's still a bit of a black magic thing to me

yurume

2022-06-05 09:34:56

but it can be also argued that MA trees do the same thing; it can _cover_ the regions where weighted predictors are yet to catch up

_wb_

2022-06-05 09:34:58

How the error feeds back into the predictors themselves and all

yurume

2022-06-05 09:35:30

I suppose `max_error` can be used for that purpose as well

_wb_

2022-06-05 09:35:40	It was designed to be used with a fixed ctx model, something like how we use it at -e 3
2022-06-05 09:36:02	Which is basically just a context based on max error
2022-06-05 09:39:32	So, maybe one more bug to hunt in the weighted predictor, and then there's only palette and squeeze left to have a full modular decoder?

yurume

2022-06-05 09:39:55

I've implemented palette already, I needed that for the full fjxl decoding

_wb_

2022-06-05 09:40:01	Ah right
2022-06-05 09:40:07	Also deltas and all?

yurume

2022-06-05 09:40:29	ah that part is still a bit elusive, but should work after some refactoring
2022-06-05 09:40:49	at the moment my code is structured in the way reusing predictors is hard
2022-06-05 09:41:41	I used to think predictors are integral part of modular image decoding, but due to palette I would have to refactor it into a separate part and data structure

_wb_

2022-06-05 09:42:19	I'm thinking maybe we just shouldn't have allowed deltas with the weighted predictor, it makes things harder than needed I think
2022-06-05 09:42:33	But oh well

yurume

2022-06-05 09:44:01	well palettes were not hard to implement after all
2022-06-05 09:44:17	it should be trivial to implement deltas after refactoring
2022-06-05 09:44:48	I think I've found only one or two bugs in the palette pseudocode, both can be fixed with a common sense
2022-06-05 09:45:04	bugs in weighted predictors are essentially impossible to fix in such ways though 😉
2022-06-05 10:28:51	the (hopefully) final wp bug is fixed, yay
2022-06-05 10:31:01	...and my hope was unjustified
2022-06-05 11:14:54	okay that one was not wp but another predictor, I think I've fixed them all
2022-06-06 01:39:34	updated the wp issue to reflect what I've found so far
2022-06-06 09:47:19	the first benchmarking attempt on jxsml: a particular 24.8MP modular image is decoded (and discarded) in 5.1s, compared to 4.6s for a particular build of djxl with a single thread

veluca

2022-06-06 09:54:38

what kind of modular image?

yurume

2022-06-06 09:55:37	encoded with `cjxl -d 0 -e 3 -j -m --patches=0`
2022-06-06 09:56:14	checking if 32-bit modular buffers make a difference (so far I was using 16-bit)

veluca

2022-06-06 09:56:22	mhhh yeah, decoding those is pretty hard to optimize
2022-06-06 09:56:48	i.e. I'm not surprised djxl isn't much faster

yurume

2022-06-06 09:56:58

not just that, but also encoding is very slow in that case (I couldn't use the default `-e 6` as it was too slow to encode)

veluca

2022-06-06 09:57:24	ah wait, that's -e 3, I read -e 9 somehow
2022-06-06 09:57:43	so it uses wp

yurume

2022-06-06 09:58:01

does `-e 4` use wp?

veluca

2022-06-06 09:58:08	I kinda feel like the gap should be higher then (I assume you didn't write special-case decode paths)
2022-06-06 09:58:19	IIRC it doesn't

yurume

2022-06-06 09:58:20

or, in the other words, which level is the lowest level that doesn't use wp

veluca

2022-06-06 09:58:29

-e 2 sure won't

yurume

2022-06-06 09:58:32	okay I'll try again with `-e 4` then
2022-06-06 10:01:18	so that does make a big difference, jxsml: 11.2s, single-threaded djxl: 8.2s
2022-06-06 10:02:04	time to implement a poor man's template in C

veluca

2022-06-06 10:02:11	heh
2022-06-06 10:02:30	you could also try `-e 2`
2022-06-06 10:02:38	difference should be more significant still
	yurume time to implement a poor man's template in C
2022-06-06 10:03:16	I assume for switching 16- and 32- bit buffers?

yurume

2022-06-06 10:03:53	```c #ifndef JXSML__RECURSING // non-templated code here // template definition #endif #ifdef JXSML__RECURSING+0 == 123 // templated code here, using JXSML__P #undef JXSML__P #endif #ifndef JXSML__RECURSING // template invocation #define JXSML__RECURSING 123 #define JXSML__P 16 #include __FILE__ #define JXSML__P 32 #include __FILE__ #undef JXSML__RECURSING // non-templated code here #endif ```
2022-06-06 10:04:03	that kind of thing
2022-06-06 10:04:52	I already have a condition for `modular_16bit_buffers`, just that my current code only handles the 16-bit case

veluca

2022-06-06 10:04:53

yup, nice preprocessor magic

_wb_

2022-06-06 10:49:36	e2 uses only Gradient, e3 only WP
2022-06-06 10:52:19	And they both use a tree that only uses one property so it can be done as a lookup table

yurume

2022-06-06 10:52:37	yeah, I'm not yet to filter the tree for that kind of analysis
2022-06-06 11:55:08	okay, I made jxsml to compute `use_wp` and only enable wp when necessary, and that brought jxsml on par with single-threaded djxl (both 1.7s) for `-e 1` images
2022-06-06 11:55:56	I wondered if I should make it also templated, but in hindsight we are already far from vectorization so a boolean flag check doesn't cost jxsml much
2022-06-06 11:58:59	for `-e 2` jxsml is still faster than before but it took 3.2s while djxl took 1.9s, so that's probably where a LUT makes a difference?
2022-06-07 12:00:13	I think this is enough for now, it is not extremely slower than djxl and that's acceptable for me
2022-06-07 12:01:35	now it's time to actually do the VarDCT stuff; I was specifically reading classic papers like Loeffler's _Practical fast 1-D DCT algorithms with 11 multiplications_
2022-06-07 12:04:13	I have no idea about the fastest known strategy for larger IDCT, is radix-2 or split-radix-2/4 a way to go? libjxl seems to have an automatically generated SIMD code for all supported DCT sizes and I don't know what generated it.

_wb_

2022-06-07 05:19:50

If you go back to earlier versions of libjxl, it had a recursive dct implementation before, which was also fast but fewer lines of code

veluca

	yurume I have no idea about the fastest known strategy for larger IDCT, is radix-2 or split-radix-2/4 a way to go? libjxl seems to have an automatically generated SIMD code for all supported DCT sizes and I don't know what generated it.
2022-06-07 08:56:20	uhm, I am pretty sure it doesn't have automatically generated code 😛

yurume

2022-06-07 08:56:42

is it crafted by hand????

veluca

2022-06-07 08:57:09	https://github.com/libjxl/libjxl/blob/main/lib/jxl/dct-inl.h#L40
2022-06-07 08:57:20	if you mean that file, I wrote it

yurume

2022-06-07 08:57:58

ah, I meant `fast_dct##-inl.h`

veluca

2022-06-07 08:58:05	ah that
2022-06-07 08:58:23	you can ignore it, it's just an arm-specific implementation
2022-06-07 08:58:36	not even used anywhere (yet?)
2022-06-07 08:59:35	it's the same algorithm as the code I linked above, modulo some tweaks to make it have slightly better precision in fixpoint... but probably not nearly as good as it could be

yurume

2022-06-07 07:19:05	today in jxsml: took time to update my msys2 install (which was stale since 2014 apparently), required for a QoL improvement in gdb like this:
2022-06-07 07:19:54	(my old version of gdb was using Python 2.7 for scripting...)
2022-06-08 12:15:26	nope, just that I never updated my msys2 install since 2014 😉
2022-06-08 12:16:09	since then pacman moved to zstd, which caused a massive pain (I eventually had to remove the previous install and reinstall it)
2022-06-08 12:16:52	that kind of chore was probably why I never updated it since then
2022-06-08 08:20:36	today in jxsml: uh, oops. ```cpp for (size_t c = 0; c < 3; c++) { for (size_t i = 0; i < 9; i++) { JXL_RETURN_IF_ERROR(F16Coder::Read(br, &encoding->afv_weights[c][i])); } for (size_t i = 0; i < 6; i++) { encoding->afv_weights[c][i] *= 64; } JXL_RETURN_IF_ERROR(DecodeDctParams(br, &encoding->dct_params)); JXL_RETURN_IF_ERROR(DecodeDctParams(br, &encoding->dct_params_afv_4x4)); // ^ these get decoded 3 times, only retaining the last one, so previous two are duplicates & useless } ```
2022-06-08 08:22:44	(it's both a spec bug and a libjxl encoder inefficiency, as it currently encodes same parameters 3 times)

_wb_

2022-06-08 08:30:06	Oops!
2022-06-08 08:30:32	How many existing bitstreams does that break if we fix it?
2022-06-08 08:30:42	<@179701849576833024>

veluca

2022-06-08 08:33:01	Probably 0?
2022-06-08 08:33:29	Ah, maybe those with dc_level greater than 1

yurume

2022-06-08 08:33:58

I think you can safely make it reserved, it's not a big deal

_wb_

2022-06-08 08:34:53

If libjxl encode doesn't signal that stuff, we can just fix it

veluca

2022-06-08 08:35:58	I'm surprised the encoder and the decoder have the same bug tbh
2022-06-08 08:36:47	Anyway, the only way I know of to get custom quantization matrices is to use uniform error mode

yurume

2022-06-08 08:36:49

surprisingly yes! so if you are super careful you can just fix the encoder to encode placeholder params for the first two.

veluca

2022-06-08 08:37:46	doesn't change much 😄
2022-06-08 08:39:44	ah

yurume

2022-06-08 08:39:49

saving at most 65 bytes? :p

veluca

2022-06-08 08:40:09

ah right, dct_params have VLE

yurume

2022-06-08 08:40:13

(= 2 * (4 + 16 * 16) bits)

veluca

2022-06-08 08:40:31

anyway, I checked and even progressive_dc=2 doesn't use custom AFV tables

yurume

2022-06-08 08:41:06

okay, I'll file an issue

veluca

2022-06-08 08:41:34	in fact, nothing in the entire libjxl source code that I could find uses custom AFV tables
2022-06-08 08:42:06	so I'd say the chance of actually breaking some bitstreams by fixing them is... pretty much 0

yurume

2022-06-08 08:52:06

https://github.com/libjxl/libjxl/issues/1484

veluca

2022-06-08 09:05:03

Thanks :)

yurume

2022-06-09 10:27:02

https://jsfiddle.net/0Lmqtd2n/ I visualized natural DCT coefficient order for a visual aid

veluca

2022-06-09 12:31:45

nice! yup, seems about right

_wb_

2022-06-09 04:14:47	cool
2022-06-09 04:16:08	why do we call that natural order btw? I would call it zigzag

yurume

2022-06-09 07:29:17

yeah, the code surely looked like an extension to JPEG zigzag order but I wanted to confirm that

veluca

	_wb_ why do we call that natural order btw? I would call it zigzag
2022-06-09 07:35:57	no clue

_wb_

2022-06-09 08:02:26	It's slightly confusing since you could assume that the natural order is scanline order
2022-06-10 08:23:57	https://twitter.com/jonsneyers/status/1535356866301202434?s=20&t=FKuvYMVfyunxZwL4j81_CQ
2022-06-10 08:24:07	sigh I hate it that Twitter has no edit button
2022-06-10 08:24:29	"it is however a critical" wtf

yurume

2022-06-11 12:25:08	lower deviation (towards top) is better
2022-06-11 12:26:26	as I understand it, each point indicates that for a particular set of encoder & parameters, all images have this mean score and this standard deviation
2022-06-11 12:27:20	thinking about that lower deviation can (but not necessarily) result in a flatter curve, but anyway

monad

2022-06-11 01:35:07

Assuming high mean DMOS correlates with low standard deviation, "flatter is better" seems an okay interpretation.

_wb_

2022-06-11 06:05:49

A setting with a high mean DMOS is likely to have a lower stdev since DMOS is capped at 100 (can't go better than "as good as the original")

yurume

2022-06-11 06:08:01	today in jxsml: VarDCT pipeline is complex enough that I needed some outline (text version: https://gist.github.com/lifthrasiir/a6058584fde522de092a74b5e5517f73)
2022-06-11 06:17:36	now I have a lot of questions left unanswered 😉

_wb_

2022-06-11 06:22:39

E.g. mozjpeg q70 is the black point that gets a mean DMOS of 80 and has a stdev of 7, which means (assuming normal distribution, which is not really true but whatever) that 68% of the time, the DMOS is in [73,87] and 95% of the time, it's in [66,94]. At the same mean DMOS of 80, which for jxl is cjxl q70, that 95% interval would be [70,90] or so, while for avif and webp it would be [64,98] or so. (The actual intervals would not be that symmetric, and be more skewed towards the lower side)

yurume

	yurume today in jxsml: VarDCT pipeline is complex enough that I needed some outline (text version: https://gist.github.com/lifthrasiir/a6058584fde522de092a74b5e5517f73)
2022-06-11 06:44:34	following this outline, my biggest question right now is how a varblock across group boundaries is handled
2022-06-11 06:46:28	I get that varblocks are in the raster order of its left-top corner, and they should not overlap each other
2022-06-11 06:48:33	but packing should select the next possible location (in the raster order) for each varblock
2022-06-11 06:49:07	and I'm not sure whether the varblock that can possibly cross group boundaries is disallowed or relocated to the next possible position

_wb_

2022-06-11 07:20:08	It is disallowed to cross group boundaries
2022-06-11 07:22:18	So no relocation, you can just read them and put them in raster order, just skipping over positions that are already covered
2022-06-11 07:24:25	The first not-already-covered 8x8 cell should also be a position where you can fit the (topleft corner of the) varblock and if that makes it cross a group boundary, it's an invalid bitstream

yurume

2022-06-11 07:36:20

in the other words each varblock is & should be encoded in the way that there is no such crossing and out-of-boundary condition in general

veluca

2022-06-11 08:23:17

yep

_wb_

2022-06-11 08:43:12

yes, i.e. the decoder doesn't need to play tetris

spider-mario

	_wb_ "it is however a critical" wtf
2022-06-11 01:44:00	https://en.wiktionary.org/wiki/critical#Noun > 1. A critical value, factor, etc. > > - 1976, American Society of Mechanical Engineers, _Journal of engineering for industry_, volume 98, page 508: > The second undamped system criticals show a greater percentage depression than the first.
2022-06-11 02:02:46	seems they don’t know much about breakdancing either then

JendaLinda

2022-06-11 03:24:33

I've noticed that in the lossy mode, JXL allows the image colors to bleed into the fully transparent pixels. It looks quite pretty, it's an interesting effect. I guess the codec does this for better compression, as in the lossy mode the contents of the fully transparent pixels are discarded anyway.

_wb_

2022-06-11 03:35:12	Yes, it helps to avoid artifacts caused by abrupt changes at the transparency boundary, so it's better than e.g. just making all invisible pixels black.
2022-06-11 03:37:35	Preserving the input color at invisible pixels is typically a bad idea since optimized pngs often have weird stuff there that happens to compress well in png, e.g. horizontal lines of the same color as the last non-invisible pixel. That's quite bad for dct compression though...

JendaLinda

2022-06-11 04:11:07

To me, this looks more like some prediction trickery rather than DCT optimization, and the same optimization is done in both VarDCT and Modular.

_wb_

2022-06-11 04:32:45	Only in lossy modular should it be done iirc
2022-06-11 04:35:25	It basically tries to turn invisible pixels into a blurry mess that should compress well with dct, and that also causes less problems when filters etc bleed invisible colors into visible ones

JendaLinda

2022-06-11 04:38:53

It's interesting that the blurring is going mostly downwards. Yes, I was talking about the lossy modular.

_wb_

2022-06-11 05:05:26

It's mostly downwards just to keep it a cheap thing to do in one scan of the image

falsjds

2022-06-12 10:06:43

how do i create an animated .jxl | `cjxl` compresses an image or animation to the JPEG XL format. Is there a cli option i have not found?

_wb_

2022-06-12 10:12:48

Use apng input

fab

	falsjds how do i create an animated .jxl \| `cjxl` compresses an image or animation to the JPEG XL format. Is there a cli option i have not found?
2022-06-12 03:13:33	https://discord.com/channels/794206087879852103/804324493420920833/985536771603460106
2022-06-12 03:13:40	cjxlng is unstable now
2022-06-12 03:13:53	so you will have to wait
2022-06-12 03:14:39
2022-06-12 03:14:51	https://github.com/libjxl/libjxl/milestones

yurume

2022-06-14 05:04:46

today in jxsml: realized that you need to process LLF before any HF decoding (because the HF context depends on quantized LF values anyway)

veluca

2022-06-14 08:30:03	well, you need to decode LLF
2022-06-14 08:30:12	not necessarily process

_wb_

2022-06-14 08:55:29

can't really do much with HF anyway without LF...

yurume

2022-06-14 09:28:38

I somehow thought they can be independently decoded and later combined

DraX

2022-06-14 10:56:39

Hi there. I heard about jpeg xl recently and it seemed interesting. I'm going through a phase of trying to convert a lot of my lossless formats into efficiently lossy ones and soon I'll be tackling images. So, as someone who doesn't really know much about formats or codecs, naturally I have a few questions I was hoping could be answered here: -I have a lot of .pdf, .cbr and .cbz files. Would it be possible to convert the images in these files to jxl losslessly in the case of jpegs, (which are already lossy) and lossily (is that even a word?) In the case of .pngs etc. So they can still be read in a standard pdf/cbr reader like Zathura? Or Emacs? -do either ffmpeg or imagemagick have the necessary tools rn to losslessly convert jpeg to jxl? Or do I still have to download and install the libjxl tools on their own?

Cool Doggo

2022-06-14 11:44:53	lossless itself isnt specific to jpeg, lossless transcoding is
2022-06-14 11:46:11	ffmpeg cannot do lossless transcoding (unless there was something updated) because it requires the actual bit data of the jpeg to do so, it will still losslessly encode it using the pixel data assuming you are using d 0. i believe its the same for imagemagick too

Nova Aurora

2022-06-14 11:56:33	Currently Zathura can't do jxl, lossless jpeg transcode enables the reconstruction of the exact bits of the jpeg, meaning no further loss is incurred with it.
	Nova Aurora Currently Zathura can't do jxl, lossless jpeg transcode enables the reconstruction of the exact bits of the jpeg, meaning no further loss is incurred with it.
2022-06-15 12:00:57	Someone could write a plugin for zathura and emacs, but I'm not sure how involved that would be
2022-06-15 12:14:50	~~I guess~~ PDF is much harder, ~~more components have to be patched (such as poppler). As long as JXL is not included in PDF standard (ISO 32000), other~~ PDF readers ~~will probably~~ have issues ~~reading JXL inside PDF~~

veluca

2022-06-15 05:06:50

every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? 😄

_wb_

2022-06-15 05:13:47	JXL will probably at some point be in the PDF standard (at least that's the impression I had), but I expect it to take many, many years before all pdf tools will support that version of pdf
	veluca every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? 😄
2022-06-15 05:15:52	My current hypothesis is that tv range still exists just because smaller range produces smaller files so it makes encoders look better when people do benchmarks in ill-designed ways.

veluca

2022-06-15 05:18:28	in this specific situation, this came up because of cICP in PNG apparently allowing for limited range RGB
2022-06-15 05:18:46	which seems... uh... an interesting choice?

_wb_

2022-06-15 06:07:39	Limited range _RGB_?
2022-06-15 06:09:20	I guess one advantage of not setting black at 0 and white at maxval, is that you can have out-of-gamut colors without clipping
2022-06-15 06:09:57	(then again I think it's also a nice property of uint representations that everything is by definition in gamut)

veluca

	_wb_ Limited range _RGB_?
2022-06-15 07:25:11	yeeeah...

_wb_

2022-06-15 07:26:28	iirc, camera raws also tend to use a reduced range in a way, i.e. they have some non-zero black level and no pixel is ever 2^bitdepth -1
2022-06-15 07:27:40	but I guess that has more to do with how the sensor works than with an intentionally reduced range

veluca

2022-06-15 07:28:05

I can't think of any good reason for limited range rgb to be a thing tbh

_wb_

2022-06-15 08:49:04	me neither
2022-06-15 08:50:01	if you need room at the ends for editing or whatever, just doing limited range is not going to help, because the extra room you get with that is pretty small
2022-06-15 08:51:35	using float is safer for that — float can be seen as "limited range" if 0..1 is the nominal range, and it's basically using only about 1/4th of the available range (half of the floats are negative and about half of the positive floats are > 1)
2022-06-15 08:52:34	the only reason I can imagine for limited range rgb is to be able to use full range ycbcr transforms on it to convert it to/from limited range ycbcr?
2022-06-15 08:54:07	(and I guess if the input image is in tv range ycbcr, converting it to full range rgb is slightly awkward because you basically have to spread the values in a way that cannot really be done nicely uniformly)

spider-mario

2022-06-15 08:54:09	as Jon points out, camera raws are a use case, although it wouldn’t really justify having it in PNG specifically
2022-06-15 08:54:15	(who would use PNG for raws)

_wb_

2022-06-15 08:56:01	in 8-bit, limited range ycbcr is [16,235] for luma and [16,240] for chroma, so I suppose that more or less leads to 220 different values per channel when converting to rgb
2022-06-15 08:57:17	keeping those 220 values compact is probably better for compression than spreading them out over 256 values
2022-06-15 08:58:29	the CAFE image is like that iirc, it has been tv range at some point and now only uses some RGB values
2022-06-15 09:02:08	$ ../build/tools/cjxl ClassA_8bit_CAFE_2048x2560_8b_RGB.ppm.png -d 0 -C 0 -c 1 -Y 0 JPEG XL encoder v0.7.0 a444260b [AVX2,SSE4,SSSE3,Scalar] No output file specified. Encoding will be performed, but the result will be discarded. Read 1280x1600 image, 19.5 MP/s Encoding [Modular, lossless, squirrel], 4 threads. ./lib/jxl/modular/transform/enc_palette.cc:232: Channel 0 uses only 208 colors. ./lib/jxl/modular/transform/enc_palette.cc:232: Channel 2 uses only 208 colors. ./lib/jxl/modular/transform/enc_palette.cc:232: Channel 4 uses only 208 colors.
2022-06-15 09:02:32	looks like it's even only 208 values per channel, I dunno how they ended up with that
2022-06-15 09:03:10	maybe just contrast stretching done after the image was already 8-bit

spider-mario

2022-06-15 09:16:37	I don’t like that image
2022-06-15 09:16:38	it’s ugly
2022-06-15 09:16:47	oversaturated, oversharpened, noisy and aliased
2022-06-15 09:17:40	lots of local contrast but lacking in global contrast
2022-06-15 09:17:48	(I hope the person who made it is not here)

veluca

	spider-mario oversaturated, oversharpened, noisy and aliased
2022-06-15 09:21:54	anything else? 🤣
2022-06-15 09:22:17	but this discussion actually raises a good point: does anybody actually define limited range rgb?
2022-06-15 09:22:44	16-235
2022-06-15 09:22:46	apparently yes

_wb_

	spider-mario (I hope the person who made it is not here)
2022-06-15 11:01:30	I have no clue who made it, but it must have been a long time ago, and I think it must have been someone from Belgium (at the least the photo was taken in Belgium) but it was certainly not me
2022-06-15 11:02:08	And yes it looks way overprocessed

BlueSwordM

	veluca every time the topic comes around I can't find an explanation for the continued existence of limited/tv range YUV -- does anybody have one? 😄
2022-06-15 01:21:33	The main reasons are just standards and inertia, with a bit of film-making in the latter. For standard stuff, it is mainly related to the fact that even today, the only consumer video standard that forces full-range is Dolby Vision Profile 5, and even then, chroma subsampling of 4:2:0 is still used.

_wb_

2022-06-15 01:24:28	I guess it will help if we make tools have more sensible defaults
2022-06-15 01:24:50	like mozjpeg doing progressive by default and 4:4:4 for higher quality settings
2022-06-15 01:25:24	while the normal libjpeg-turbo requires you to explicitly choose progressive and 4:4:4
2022-06-15 01:25:48	I think it would make sense if ffmpeg would use full range by default if it can

Traneptora

	_wb_ I think it would make sense if ffmpeg would use full range by default if it can
2022-06-15 05:00:54	ffmpeg defaults to input range
2022-06-15 05:01:16	if it automatically converted limited range to full range it would break a lot of existing things that expect limited range

_wb_

2022-06-15 05:12:16

what if input is rgb? does it then default to full range or tv range yuv?

Traneptora

2022-06-15 05:12:30

it defaults to whatever the input range is

_wb_

2022-06-15 05:12:52

but e.g. to me it's not intuitive that `-pix_fmt yuv444` implies tv range

Traneptora

2022-06-15 05:13:02

it doesn't

_wb_

2022-06-15 05:13:13

you need `yuvj444` if you want full range iirc, no?

Traneptora

2022-06-15 05:13:23	no, those have been deprecated for over a decade and you should not use them
2022-06-15 05:13:33	you use `yuv444p` for both full and limited range
2022-06-15 05:13:51	`-color_range pc` or `-color_range tv` to tag the output file appropriately
2022-06-15 05:14:30	I believe when converting from RGB -> YUV, the default is full-range for RGB and limited-range for YUV
2022-06-15 05:14:52	but if no colormatrix is used then it defaults to whatever was provided on input
2022-06-15 05:15:21	the problem is virtually nothing supports full-range YUV
2022-06-15 05:15:36	so that's why that default exists
2022-06-15 05:16:04	but, if you say, had full-range yuv420p input, and ran -pix_fmt yuv444p it wouldn't automatically convert it to limited range
2022-06-15 05:16:06	as far as I'm aware

_wb_

2022-06-15 05:16:13

ah ok

Traneptora

2022-06-15 05:16:47	the libjxl encoder wrapper prints a warning if you attempt to pass it limited range. (at least it does if they ever merge my patch grumble grumble)
2022-06-15 05:17:06	it also prints a similar warning if you pass it untagged range
2022-06-15 05:17:55	```c + /* JPEG XL format itself does not support limited range */ + if (avctx->color_range == AVCOL_RANGE_MPEG \|\| + avctx->color_range == AVCOL_RANGE_UNSPECIFIED && frame->color_range == AVCOL_RANGE_MPEG) + av_log(avctx, AV_LOG_ERROR, "This encoder does not support limited (tv) range, colors will be wrong!\n"); + else if (avctx->color_range != AVCOL_RANGE_JPEG && frame->color_range != AVCOL_RANGE_JPEG) + av_log(avctx, AV_LOG_WARNING, "Unknown color range, assuming full (pc)\n"); ```

_wb_

2022-06-15 05:18:30

my ffmpeg does default to tv range when encoding a bunch of png frames as input and some video codec as output

Traneptora

2022-06-15 05:18:50	> when converting from RGB -> YUV, the default is full-range for RGB and limited-range for YUV
2022-06-15 05:19:12	it won't automatically shrink it if you keep it in RGB, like if you were to encode it to, say ffv1
2022-06-15 05:19:19	which supports RGB

veluca

2022-06-15 06:37:39	I wish I could say I am surprised by how much confusion can be generated by things like the `colr` box
2022-06-15 06:37:51	which for whatever reason uses 2 bytes to send values 0-255
2022-06-15 06:38:09	but unfortunately I'm not surprised 😦
2022-06-15 06:38:30	(https://github.com/w3c/PNG-spec/issues/129#issuecomment-1156796286)

ziemek.z

2022-06-16 09:58:20

Been browsing FLIF and FUIF repo recently. <@794205442175402004> FLIF is still to be completely finished, even if it takes me 10 years in my free time to squash all bugs, but shouldn't FUIF repo be archived since it's *totally* obsolete because of being included into JPEG XL?

_wb_

2022-06-16 10:20:54

what do you mean "be archived"? it kind of already is, it's just sitting there as an inactive repo

ziemek.z

	_wb_ what do you mean "be archived"? it kind of already is, it's just sitting there as an inactive repo
2022-06-16 11:15:38	https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories
2022-06-16 01:03:14	FUIF, from what I understand, was totally, completely merged into JPEG XL, so IMHO there's no point in contributing to it. The whole codebase has been moved. FLIF is a different thing. A foundation for FUIF, but not its direct predecessor ("parent").

Jyrki Alakuijala

2022-06-17 02:59:21	I may be mistaken, but my impression is that FLIF has a decoding speed issue -- to be practical it would need a new approach to decoding speed
2022-06-17 02:59:48	that new approach is partially available in JPEG XL -- JPEG XL needs some encoding/decoding refinements for the best speed, too
2022-06-17 03:00:16	but great decoding speed in general is easier in JPEG XL, because of entropy clustering is used rather than updating probabilities

Traneptora

2022-06-17 09:40:13	Yea FLIF is impractically slow
2022-06-17 09:40:30	significantly slower than ffv1, for context
2022-06-17 09:40:39	without significant improvement
2022-06-17 09:41:12	in bpp that is
2022-06-17 09:41:39	That said, lossless JXL is not particularly fast
2022-06-17 09:41:53	it's much slower to decode than PNG

_wb_

2022-06-17 10:20:20

well png is basically just gunzip

Traneptora

	_wb_ well png is basically just gunzip
2022-06-18 03:07:14	which is weirdly slow to decode tbf
2022-06-18 03:07:24	but maybe that's just due to the quantity of data

BlueSwordM

	Traneptora but maybe that's just due to the quantity of data
2022-06-18 03:33:09	That is mainly it. Entropy coders get tested the most at high data rates.

_wb_

2022-06-18 05:40:56	You can put a truncated jpeg in a zip file
2022-06-18 05:41:48	It's probably a progressive jpeg that has part of its last scan cut off
2022-06-18 05:42:12	The bottom probably looks slightly worse than the top, if you look carefully
2022-06-18 05:43:45	iirc, you can use jpegtran to turn it into a syntactically correct jpeg
2022-06-18 05:49:38	Ah, probably it is. What does `djpeg -verbose -verbose image.jpg >/dev/null` say?
2022-06-18 05:50:00	Could be that it just has its end marker missing or something
2022-06-18 06:10:56	And you're sure the bottom right of the image looks fine?
2022-06-18 06:11:20	Could be the only thing that is missing is the end marker itself
2022-06-18 06:11:54	Does `jpegtran image.jpg > fixed.jpg` work?
2022-06-18 06:20:53	Yes, but not much data missing, it's indeed not a progressive one so it's just that last row of blocks
2022-06-18 06:21:33	You can use jpegtran to losslessly crop away that last row, and then recompress that
2022-06-18 06:28:17	Yeah that command will not crop, just turn it into something jxl can recompress

The_Decryptor

2022-06-18 06:33:12

You need to supply `-copy all` to keep metadata

yurume

2022-06-18 07:01:23	today in jxsml: I've done implementing DCT (both forward and inverse), and then caught up with a series of refactorings
2022-06-18 07:02:35	I think I've revised `jxsml__lf_group_t` (a data structure for parallel HF decoding) at least 10 times

veluca

2022-06-18 07:03:55

heh

yurume

2022-06-18 07:26:35	for example I initially didn't preserve LfQuant and directly converted it into LLF coefficients and threw it away
2022-06-18 07:26:57	but I realized that I need to preserve it for coefficient context modelling
2022-06-18 07:27:42	and then I realized that I don't actually have to preserve LfQuant, only the final LF context index [0, 64) is relevant
2022-06-18 07:28:46	so that part of code has been updated twice with complete rewrite :p
2022-06-18 07:29:38	at first I wondered why `lf_index` is a separate variable in the pseudocode, now I know why
2022-06-18 07:29:48	since it is much easier to vectorize in that way

_wb_

2022-06-18 07:30:08

It could be useful to add an implementation note to the spec that says something about it

yurume

2022-06-18 07:30:38

(I was deeply concerned about the block context computation efficiency, so I looked for a way to optimize that and came to the realization)

_wb_

2022-06-18 07:37:22	I remember adding lf context to hf. IIRC, I added that to make jpeg recompression more competitive with brunsli — brunsli has other hf ctx that we can't really do in jxl since it assumes all blocks are 8x8, so I needed something else to make vardct jxl as good as brunsli so we could get rid of brunsli-in-jxl
2022-06-18 07:38:47	It was a tricky balance to do it in a way that doesn't have a decode speed cost but still gives some compression density advantage

yurume

2022-06-19 10:32:04	today in jxsml: I've finally reached the point where I successfully decoded all LF and HF coefficients (yet to be processed or permuted, though)
2022-06-19 10:32:31	...only after 4500 lines of code 🙂
2022-06-19 10:32:35	the gist has been updated: https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f

plantain

2022-06-20 05:37:55

Hi, I am trying JPEG-XL via GDAL, where the only knobs available to control quality seem to be JXL_EFFORT and JXL_DISTANCE. Is this 'standard' JPEG-XL language? How does it compare to JPEG/WebP with a sliding quality=0-100?

yurume

2022-06-20 05:43:15	JXL_DISTANCE corresponds to the Butteraugli distance target, which is a perceptual similarity metric and much more defined than a simple "quality" factor
2022-06-20 05:45:28	for example q90 in typical JPEG can result in quite varying degree of actual quality, but d1.0 in JPEG XL results in more or less a compressed image which Butteraugli distance is hopefully close to 1.0
2022-06-20 05:48:24	larger JXL_EFFORT generally improves a distance accuracy and in many (but not all) cases also results in a smaller file due to the increased number of knobs

plantain

2022-06-20 05:51:19

and am I correct in understanding distance=1 should be visually lossless?

yurume

2022-06-20 05:51:39	there is an approximate relation between JPEG quality and Butteraugli distance (e.g. cjxl maps -q30..100 into -d6.4..0.1 linearly) but there's a considerable variation due to those reasons
2022-06-20 05:52:04	it depends especially for image types
2022-06-20 05:52:17	I've heard that 1.5 might be fine
2022-06-20 05:53:04	on the other hands, back in time (i.e. might be invalid now) I remember using -d1.0 and found some visible loss when zoomed in
2022-06-20 05:54:26	I think the current encoder assumes a particular (but configurable) viewing distance, which might or might not be okay

plantain

2022-06-20 05:54:45

ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare

yurume

2022-06-20 05:55:42

at that size and usage I think you need to check the zoomed-in result, cjxl does have a ton of configurations that might be helpful (including the aforementioned viewing distance) but GDAL doesn't seem to have any

plantain

2022-06-20 05:56:37

it does seem to be much slower to encode, I think single threaded in GDAL as well, which might be a catch

yurume

2022-06-20 05:57:27

indeed, it doesn't have a JxlEncoderSetParallelRunner call?

plantain

2022-06-20 05:58:20	it does appear to have it in the source... I'll keep digging
2022-06-20 05:58:55	ah, only in the explicit JPEGXL driver, not the GeoTIFF driver

yurume

2022-06-20 05:59:08

huh weird, what's a difference between https://github.com/OSGeo/gdal/blob/master/frmts/gtiff/tif_jxl.c and https://github.com/OSGeo/gdal/blob/master/frmts/jpegxl/jpegxl.cpp

plantain

2022-06-20 06:00:03

the former is for embedding inside GeoTIFF's

monad

	plantain and am I correct in understanding distance=1 should be visually lossless?
2022-06-20 06:46:51	d1 targets near-visually-lossless at 1000 pixels viewing distance. At worst this should allow a slight difference only noticeable with a flip test.

plantain

2022-06-20 07:00:45	I don't really understand the concept of viewing distance
2022-06-20 07:01:21	1000 pixels as in the same pixel size as the screen, but in the Z axis towards the viewer?
2022-06-20 07:01:57	so if my DPI is 100px/cm, d1 is near-visually-lossless at 10cm?

yurume

2022-06-20 07:27:23	I think so, provided that the device pixel corresponds to the image pixel (which is often not, especially in mobiles)
2022-06-20 07:31:49	there is also an assumption of the display brightness, which is by default 255 cd/m^2 for non-HDR images (for the reference, both my laptop and monitors have about 300--330 cd/m^2 brightness)

The_Decryptor

2022-06-20 07:51:14

I think the "standard" for sRGB is supposed to be 80 nits, but my monitor is over twice that and it can still appear dim sometimes

novomesk

	plantain ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
2022-06-20 10:12:26	In majority of my software I have artificial limit to reject too big images. It is currently 256 megapixel.

plantain

2022-06-20 10:14:45

I usually hit the real limit of 4GB of RAM long before that 🥲

novomesk

2022-06-20 11:26:39

Viewing of JXL (but also AVIF) files can be at least 2 times faster in gwenview. There is a need to refactor a portion of the gwenview's code. It is not a trivial change, it must be done in a way so there woun't be performance penalty for older formats.

diskorduser

2022-06-20 11:42:20

So, gtk based image viewers open jxl faster than gwenview?

novomesk

	diskorduser So, gtk based image viewers open jxl faster than gwenview?
2022-06-20 12:01:01	Cannot say yes or no. It depends on every implementation and loaded image. For example someone loads image as array of FLOAT, consuming 4 times more memory than 8bit array. Copying large bitmaps here and there take some time too. So even if it is not as fast as it could be, there could be slower viewers too.
2022-06-20 12:20:16	https://invent.kde.org/graphics/gwenview/-/issues/3

Traneptora

2022-06-20 02:13:19

I've had a patch sitting on the ML for two weeks that fixes some bugs, maybe related to that?

_wb_

	plantain ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
2022-06-20 03:24:20	I would try e7 or even e6

fab

	plantain ok, thanks for the information. so far the results look remarkable on my workload (30000x30000px satellite imagery), with filesizes 50% smaller than the current JPEG -q90 using distance=2 e=9... but I haven't managed to actually build any software to view the images yet to compare
2022-06-20 03:30:53	Jxl average is 37% smaller at s7 s8
2022-06-20 03:31:10	The point is to be transparent on a big variety of images without slowing the decoding
2022-06-20 03:31:25	Not to discard too much of dct

novomesk

	plantain I usually hit the real limit of 4GB of RAM long before that 🥲
2022-06-20 05:38:02	In my Qt JXL plugin I have limit 64 megapixels when running on 32bit machine. I was afraid that libjxl will run out of memory and abort() whole application.

yurume

2022-06-22 09:29:45

today in jxsml: I've finished all the necessary ingredients to render vardct (coefficient order, dequantization matrix, actual coefficients, chroma from luma, hastily bodged inverse XYB transform) and the result is not perfect

veluca

2022-06-22 09:33:22

indeed 😄

yurume

2022-06-22 09:34:09

to be fair I never tested most components making up vardct until this point

veluca

2022-06-22 09:35:27

yeah I thought so

yurume

2022-06-22 09:36:56

I could successfully decode a particular vardct image at this point though, so only the postprocessing is problematic (at least for that image)

veluca

2022-06-22 09:38:01

define "postprocessing" 😄

yurume

2022-06-22 09:44:10	not every sure lol
2022-06-22 09:46:26	I believe that's roughly HF dequantization, CfL followed by inverse XYB (I do LF dequantization much earlier)

veluca

2022-06-22 09:47:11

so from the image I unfortunately cannot pinpoint you to anything that I can confidently say to be the root cause

yurume

2022-06-22 09:47:24

if my observation is correct, raw XYB samples are actually quite large (the order of ten thousands)?

veluca

2022-06-22 09:47:29	(believe me, I messed up vardct decoding so many times that I have generally good guesses :P)
2022-06-22 09:47:35	uhhhhh... not at all?
2022-06-22 09:47:50	Y and B are ~0-1
2022-06-22 09:48:06	X is... super tiny, like -1/32 to 1/32

yurume

2022-06-22 09:48:06	huh
2022-06-22 09:48:45	I think then I have messed the dequantization up, probably by multiplying instead of dividing

veluca

2022-06-22 09:48:55	have you tried on --speed falcon images?
2022-06-22 09:49:01	those have only 8x8s

yurume

2022-06-22 09:50:29	good to know that
2022-06-22 09:54:33	not sure it helps
2022-06-22 09:57:24	should dumping XYB as if it's RGB (after some scaling and offsetting) result in something recognizable?
2022-06-22 10:00:01	okay, this is very suspicious (RGB = 128 * XYB + 128, saturated)
2022-06-22 10:22:57	it turns out that I've computed the inverse of natural order as well
2022-06-22 10:27:52	I'm also kinda sure that the DCT doesn't work at all, given the skewed appearance of each 8x8 block and a particular glitch in the middle row
2022-06-22 10:28:41	which is pretty noticable when the original image is overlaid
2022-06-22 10:59:00	haha, I realized that they are not strictly DCT8x8 but also includes DCT2x2 etc, which are intentionally left out for testing

veluca

2022-06-22 11:46:04	uhm no falcon should only be 8x8
	yurume it turns out that I've computed the inverse of natural order as well
2022-06-22 11:46:18	yeah that always gets me
2022-06-22 11:47:08	that image looks rather suspicious

yurume

2022-06-23 08:55:23

almost sure that the LF dequantization factor is also actually its reciprocal, the spec seems wrong

veluca

2022-06-23 11:02:57

entirely possible

yurume

2022-06-23 04:40:53	I've checked the entire code path towards the LF dequant factor, which is according to the spec `m_#_lf_unscaled / (global_scale * quant_lf) * 2^-extra_precision`
2022-06-23 04:41:04	but it actually seems to be something like `(global_scale * quant_dc) / 2^9 / m_#_lf_unscaled * 2^-extra_precision`

_wb_

2022-06-23 04:44:20

So it's not just the inverse but also 512 times too small?

yurume

2022-06-23 04:44:26	yeah
2022-06-23 04:45:26	so there are three major differences: `m_#_lf_unscaled` is scaled by 1/128 right after decoding, the global scale is further divided by 2^16, and finally the multiplier is its inverse (before extra precision)

_wb_

2022-06-23 05:00:34

Oh boy. I wish we could have called the first edition edition 0.1 😅

veluca

2022-06-23 05:17:50

heh not quite right xD

yurume

2022-06-23 05:39:30	great, I've found yet another missing piece of puzzle---I forgot to reshuffle inverse-transformed samples back to rectangular grids 🙂
2022-06-23 05:42:28	so jxsml stores coefficients in the varblock order, which gets IDCTed to samples but still in the varblock order before actual reshuffling
2022-06-23 05:43:02	I think I was aware of this when I designed the original pipeline but forgot to do the final shuffling due to a large amount of tasks I had to tackle
2022-06-23 07:21:19	this does look like something
2022-06-23 07:21:41	(for the reference, it uses one epf iteration so I guess the blocky appearance is actually correct)
2022-06-23 07:22:49	this is still false color because its ranges still seem wildly out of place
2022-06-23 07:23:56	I have a range of `(min 0.000000 -14.421546 -9.860505 max 30.730759 8.456084 18.513645)` for XYB samples

_wb_

2022-06-23 07:33:00	X is supposed to be signed and close to zero, something like -0.01 to +0.01
2022-06-23 07:33:18	Y is supposed to have a 0..1 range or so
2022-06-23 07:33:23	B too

yurume

2022-06-23 07:33:25

yeah, there still seems some missing multipliers

_wb_

2022-06-23 07:33:55

Also the order of XYB could be wrong in the spec

yurume

2022-06-23 07:33:55	but somehow it does produce a recognizable image, so it is probably only a linear relation missing by now?
2022-06-23 07:34:17	that was REALLY confusing!
2022-06-23 07:34:20	XYB vs. YXB :p

_wb_

2022-06-23 07:35:22	Yes, we should have just done everything in YXB order all the time, it's confusing also in libjxl imo
2022-06-23 07:36:52	The blockiness makes me think this is mostly dc only, something still wrong with ac I think
2022-06-23 07:37:20	Getting closer to a decoded image though

yurume

2022-06-23 07:38:22

for the reference, this is the actual image encoded to and decoded back from jxl (-d1.0 -e3)

_wb_

2022-06-23 07:58:37

Are you doing the chroma from luma thing for both lf and hf?

yurume

2022-06-23 08:04:31

I believe so

_wb_

2022-06-23 08:19:56

It looks like X and B have too high amplitudes in that image

yurume

2022-06-23 09:06:21

I'm very suspicious of LF dequant factors again, they have a range of `(min 0.000000 -13.543465 -5.319397 max 30.656525 6.859222 17.358032)` which is not much different from all other samples

_wb_

2022-06-23 09:22:56

Is that YXB order?

yurume

2022-06-23 09:23:54	no, XYB
2022-06-23 09:24:22	X minimum might be larger than 0 because I have intentionally capped it to zero for the inspection purpose
2022-06-23 09:25:55	the following was my reasoning: ``` how LF gets quantized? (1) G.1.2: m_#_lf_unscaled are read; defaults are 4096, 512 & 256 (2) I.2.1: per-frame dequantization factors m#DC = m_#_lf_unscaled / (global_scope * quant_lf) (3) G.2.2: extra_precision is read for each LF group (4) I.4.2: dequantized coefficients d# = m#DC * (quantized coefficient) / (1 << extra_precision) in libjxl: (1) DequantMatrices::DecodeDC: - reads dc_quant_[0..2] and MULTIPLIES THEM BY 1/128 - inv_dc_quant_[0..2] = 1 / dc_quant_[0..2] (2) Quantizer::Decode: - read global_scale_ and quant_dc_ - RecomputeFromGlobalScale: - global_scale_float_ = global_scale_ / 2^16 - inv_global_scale_ = 2^16 / global_scale_ - inv_quant_dc_ = inv_global_scale_ / quant_dc_ = 2^16 / (global_scale_ * quant_dc_) - mul_dc_[0..2] = GetDcStep(0..2) = inv_quant_dc_ * dc_quant_[0..2] = (2^16 / (global_scale_ * quant_dc_)) * dc_quant_[0..2] = 2^16 * dc_quant_[0..2] / (global_scale_ * quant_dc_) - inv_mul_dc_[0..2] = GetInvDcStep(0..2) = inv_dc_quant_[0..2] * (global_scale_float_ * quant_dc_) = (1 / dc_quant_[0..2]) * (global_scale_ / 2^16) * quant_dc_ = (global_scale_ * quant_dc_) / 2^16 / dc_quant_[0..2] (3) ModularFrameDecoder::DecodeVarDCTDC: - read extra_precision (4) DequantDC, called from DecodeVarDCTDC: - dc_factors[0..2] = mul_dc_[0..2] - mul = 2^-extra_precision - multiply each row with dc_factors[0..2] * mul = 2^16 * dc_quant_[0..2] / (global_scale_ * quant_dc_) * 2^-extra_precision the original factor is supposed to be m_#_lf_unscaled / (global_scale * quant_lf) * 2^-extra_precision in reality it's (m_#_lf_unscaled / 128) / (global_scale * quant_dc) * 2^(16-extra_precision) ```
2022-06-23 09:26:20	I have checked again and couldn't find any missing piece
2022-06-23 10:03:09	okay, 1/128 multiplier for `dc_quant_` only applies when all_default is false; the default value is already multiplied
2022-06-23 10:16:50	DequantDC also receives `mul_dc_` instead of `inv_mul_dc_`, so hopefully it's just a (massive) scaling issue...?
2022-06-23 10:18:30	but somehow I'm still unable to reproduce the resulting parameters from inputs, my computed values are 2^24 times actual values---how?
2022-06-23 10:22:07	okay, the default values for `m_#_lf_unscaled` are not {4096, 512, 256}, they are {1/4096, 1/512, 1/256} and if all\_default is false three values p[0..2] are read and `m_#_lf_unscaled` should be {p[0]/128, ...}
2022-06-23 10:22:36	so that was a big confusion; both default values and scaling factors are wrong, in the different direction

veluca

2022-06-23 10:32:17

I'm very sorry xD

yurume

2022-06-23 10:43:50	well it really shows that libjxl was designed when the format itself was not yet finalized 😉
2022-06-23 11:17:14	looks a bit more reasonable (but no, it's again a false color by reinterpreting XYB as RGB and remapping min..max to 0..255)
2022-06-25 09:38:10	today in jxsml: a tiny difference in the trace output is driving me mad :S

veluca

2022-06-25 09:40:04	looks like DC channels are swapped?
2022-06-25 09:40:55	mh not just that

yurume

2022-06-25 09:40:58	not only that, but I have a hard time understanding why the apparent CfL happens at that position
2022-06-25 09:41:39	and one DC coefficient seems completely off (0.009588 vs. 0.005327)

Info

JPEG XL

General chat

Voice Channels

Archived

on-topic

Whatever else