JPEG XL

	_wb_ You mean as a pixelformat for passing buffers in the api?
2022-04-24 02:45:20	yea, lossy internally is 16-bit, but is there a way to feed the library 10-bit pixel data as a buffer and have it upsample it to 16 before lossy encoding, rather than having me do the upsampling myself
2022-04-24 02:45:32	since it already can downsample
2022-04-24 02:46:32	I suppose that would require implementing planar input though, since most of those formats are things like gbrp10le

_wb_

2022-04-24 05:01:34	Planar input is something we should probably add, we do planar internally anyway
2022-04-24 05:03:05	10-bit with maxval 1023 is something we already have for ppm, though that's outside the libjxl api atm
2022-04-24 05:05:45	It's easy enough to implement (just use a different scaling factor in the conversion to/from float), I think we mostly just need to define an appropriate way to extend JxlPixelFormat

Petr

2022-04-25 10:46:20

Promoting jxlinfo among installed tools was definitely a good idea: https://github.com/libjxl/libjxl/pull/1346 But for Windows users, this doesn't make a big difference. The reason is that the tool (downloaded from Luca's site) needs brotlienc.dll, brotlidec.dll and brotlicommon.dll. It's possible to find and download the DLLs and to run the tool successfully. But it's not very convenient. Furthermore, users must deal with x86/x64. Would it be possible to simplify this somehow for Windows users? (Is anyone here having the same experience actually?)

Traneptora

2022-04-25 10:56:32

biggest problem with jxlinfo is it doesn't provide enough info

The_Decryptor

2022-04-25 10:56:39

I think if people are using CLI tools they should have the knowledge of if their OS is 64bit or not (And it'll tell them if they try running a 64bit binary on a 32bit OS)

Traneptora

2022-04-25 10:57:10	fwiw there's no 32-bit win11 and win10 32-bit is not sold to consumers iirc
2022-04-25 10:57:25	I'd be very surprised if any `jxlinfo` users have a 32-bit OS
2022-04-25 10:58:50	jxlinfo could use some info like, for example, it could tell you if the frames are VarDCT or Modular
2022-04-25 10:58:55	or if they are lossless or lossy
2022-04-25 10:59:26	even `jxlinfo -v foo.jxl` doesn't tell you this
2022-04-25 11:00:08	It tells you it is `(possibly) lossless` which just means "modular"

Petr

	The_Decryptor I think if people are using CLI tools they should have the knowledge of if their OS is 64bit or not (And it'll tell them if they try running a 64bit binary on a 32bit OS)
2022-04-25 11:00:28	Yes, they should have the knowledge of their OS. But when they find a DLL and they find out that it's x86 while they need x64, the thing becomes too complicated IMHO.

Traneptora

2022-04-25 11:00:46	the problem here is that we aren't statically linking the dependencies
2022-04-25 11:00:53	or providing them
2022-04-25 11:01:12	this is Windows, so we can't just say "use system libbrotlidec.dll"
2022-04-25 11:01:19	it's not Linux where that works nicely

The_Decryptor

2022-04-25 11:01:47

I ended up grabbing the dlls from vcpkg, which for whatever reason does default to 32bit binaries

Traneptora

2022-04-25 11:01:49

it shouldn't be the users' responsibility to have to go search "brotli dll install" on google

The_Decryptor

2022-04-25 11:02:03

But I had already encountered that for another app, so it was a pitfall I was already aware of

Traneptora

	The_Decryptor I ended up grabbing the dlls from vcpkg, which for whatever reason does default to 32bit binaries
2022-04-25 11:02:38	if vcpkg doesn't have lib32 and lib64 distinctions then I wouldn't trust it since wow64 compat is still a problem nowadays

The_Decryptor

2022-04-25 11:02:49

But yes, it should be statically linked into jxl.dll

Traneptora

2022-04-25 11:03:09	or provided right next to it
2022-04-25 11:03:35	windows checks the same directory, unlike posix

The_Decryptor

2022-04-25 11:03:40

It does, 64bit binaries have a "x64-windows" suffix, you need to call it like `vcpkg.exe install brotli:x64-windows` or else you'll get 32bit binaries

Traneptora

2022-04-25 11:05:29	hm, that's backwards
2022-04-25 11:05:50	it should use :windows-32 or something and default to 64-bit

The_Decryptor

2022-04-25 11:06:13	Yeah, I have no idea why this is the default they picked, at the very least it should take it from the host OS
2022-04-25 11:06:27	Specify it if you're cross compiling, otherwise match the host

Petr

2022-04-25 12:16:30

The same issue (missing Brotli DLLs) is with cjxl_ng.exe and djxl_ng.exe. This increases the need to deal with it. 😜

_wb_

2022-04-25 01:05:53	sigh, windows is annoying
2022-04-25 01:08:42	maybe we should only make static binaries for windows

Vincent Torri

2022-04-25 01:47:11	use MSYS2/mingw-w64
2022-04-25 01:47:20	you can install these packages easily
2022-04-25 01:48:22	https://packages.msys2.org/package/mingw-w64-x86_64-libjxl?repo=mingw64
2022-04-25 01:49:47	i have also a small package installer, which builds from scratch everything needed for a project and can be modified easily to build a lob andits dependencies
2022-04-25 01:51:36	<@794205442175402004> dll are good (tm) on Windows (dixit a gcc dev, Pedro Alves)

_wb_

2022-04-25 02:10:58	it's somewhat annoying that we just abort when allocation fails. I understand that we don't want to rely on c++ exceptions, but how much effort would it be to catch (at least some) allocation fails and return a nice error instead of aborting?
2022-04-25 02:11:42	or at least print something informative to stderr before aborting

Vincent Torri

2022-04-25 02:11:58

abort in a lib is not nice

_wb_

2022-04-25 02:12:11	exactly
2022-04-25 02:12:42	also cjxl just silently crashing on too large input is not nice

Vincent Torri

2022-04-25 02:13:24

so write in C : http://www.rasterman.com/files/c_song.txt ('Let it be' melody)

improver

2022-04-25 04:31:19	make C wrappers which catch all
2022-04-25 04:31:57	as in make C api functions be written in C++, catch all C++ exceptions & return reasonable error codes

_wb_

2022-04-25 05:24:11

I think we want to avoid compiling with exceptions for perf reasons...

Traneptora

2022-04-25 06:09:56

if the goal is not to use C++ features like exceptions, why isn't the library just written in C?

_wb_

2022-04-25 06:30:55

We like to have the convenience of C++ but only to the extent that it doesn't come at a speed cost.

improver

2022-04-25 07:01:41	i think that exception case is slow but when no exception gets thrown it shouldn't be that slow
2022-04-25 07:02:09	remember reading about it, that C++'s exception handling is actually very optimized for no-exception case
2022-04-25 07:02:44	was this measured?
2022-04-25 07:02:50	in case of libjxl
2022-04-25 07:04:37	because i think it would be slow literally only when it'd otherwise panic

_wb_

2022-04-25 07:04:54

If so, it must have been a long time ago because we're using that compile flag for years

improver

2022-04-25 07:05:43	it's kind of popular myth that exceptions make C++ code slow
2022-04-25 07:06:11	C++'s exceptions are slow, yes, but disabling them shouldn't bring much gain in perf
2022-04-25 07:07:31	how does one benchmark libjxl anyway
2022-04-25 07:07:46	could try it to satisfy my curiousity
2022-04-25 07:08:38	<https://github.com/libjxl/libjxl/blob/main/doc/benchmarking.md> hmm foundit i g

_wb_

2022-04-25 07:17:18

Main way to test speed is `djxl foo.jxl --num_reps=50`

improver

2022-04-25 07:22:30	defaults (without exceptions): ``` % tools/djxl --num_reps=100 ~/Documents/wpedit22.jxl JPEG XL decoder v0.7.0 7f5a2cd [AVX2,SSE4,SSSE3,Scalar] Read 2473321 compressed bytes. No output file specified. Decoding will be performed, but the result will be discarded. Decoded to pixels. 1463 x 2560, geomean: 70.50 MP/s [51.72, 76.87], 100 reps, 12 threads. Allocations: 30605 (max bytes in use: 1.389899E+08) ``` with exceptions: ``` % tools/djxl --num_reps=100 ~/Documents/wpedit22.jxl JPEG XL decoder v0.7.0 7f5a2cd [AVX2,SSE4,SSSE3,Scalar] Read 2473321 compressed bytes. No output file specified. Decoding will be performed, but the result will be discarded. Decoded to pixels. 1463 x 2560, geomean: 76.65 MP/s [61.13, 80.03], 100 reps, 12 threads. Allocations: 30605 (max bytes in use: 1.389899E+08) ```
2022-04-25 07:23:49	so it's actually faster with exceptions lol
2022-04-25 07:24:52	quite consistently so, running them in whatever order
2022-04-25 07:26:19	this is for modular, i guess i can test lossy
2022-04-25 07:29:05	with vardct i cant see much of a difference, its roughly the same
2022-04-25 07:29:28	as in speed difference decreased a bit but enabling exceptions still didnt make it slower
2022-04-25 07:30:42	would need to run them one after another multiple times and averange over results
2022-04-25 07:31:03	but from current ones there doesnt seem to be significant difference

_wb_

2022-04-25 07:33:49

Interesting...

improver

2022-04-25 07:35:41	it makes sense if you think how it's implemented tbh
2022-04-25 07:36:17	exceptions are slow as hell because no checks happen after function returns or such, they unwind stuff
2022-04-25 07:36:55	which means that some code possibly in stdlib what would normally have to check stuff, doesn't have to in no-error case when exceptions are enabled
2022-04-25 07:37:44	script which runs with and without bins in loop to make stuff fair (it does run them before loop too)
2022-04-25 07:38:19	wait i messed this up a bit i think lol
2022-04-25 07:38:49	yeh meaning is inverted lol
2022-04-25 07:41:33	but like i cant really see much of a difference either way
2022-04-25 07:41:47
2022-04-25 07:41:57	this one is not messed up
2022-04-25 07:43:48	seems like it depends much more on what else happens in my computer than exceptions

_wb_

2022-04-25 07:44:35

Try with --num_threads=0 and with as little other stuff using cpu as possible

improver

2022-04-25 07:45:06	ughh second one is hard i have a lot of things running & i dont wanna kill them
2022-04-25 07:46:53	maybe if i make negative niceness it'd be okay

_wb_

2022-04-25 07:47:36	Anyway, I don't really see why enabling exceptions would be much slower for us, given that we don't use anything that can throw except for allocations, and those shouldn't be happening often enough to make a difference, I hope
2022-04-25 07:48:22	So if we can just enable exceptions and catch them at the api boundary, that would be good imo

improver

2022-04-25 07:49:35	yeah.
2022-04-25 07:51:00	re-testing modular with 0 threads & -5 niceness r n, it's consistently slower without exceptions still lol

_wb_

2022-04-25 07:51:45

<@179701849576833024> wdyt?

improver

2022-04-25 07:59:48	`-q 100 -e 9` modular decoding:
2022-04-25 07:59:50
2022-04-25 08:00:09	enabling exceptions makes it consistently faster
2022-04-25 08:06:02	`-e 9` (vardct):
2022-04-25 08:06:04

veluca

2022-04-25 08:06:11

Enabling exceptions making things faster is incredibly fishy

improver

2022-04-25 08:06:41	so kinda variable & in some cases little bit slower with exceptions here. interesting
2022-04-25 08:06:53	but it's faster on exceptions with modular
2022-04-25 08:07:04	and quite a bit more measurably
2022-04-25 08:07:34	im too lazy to draw graph with this

veluca

2022-04-25 08:07:45

Anyway, the thing with exceptions is not that they are slow (well, perhaps that too), but that they increase code size significantly, possibly hurt optimization (but probably not here) and make the program a lot harder to think about in some cases

improver

2022-04-25 08:09:15	``` 5159368 libjxl-orig/build/tools/djxl 5312976 libjxl/build/tools/djxl ``` there's a bit of increase indeed
2022-04-25 08:10:10	also ``` 4213472 libjxl-orig/build/libjxl.so 4370840 libjxl/build/libjxl.so ```
2022-04-25 08:10:36	that they are slow is kinda irrelevant if codebase don't use them
2022-04-25 08:11:08	could say the same about last one
	improver so kinda variable & in some cases little bit slower with exceptions here. interesting
2022-04-25 08:12:32	actually looks like overall it's a little bit faster with exceptions
2022-04-25 08:12:45	would need to average over results
2022-04-25 08:12:49	to confirm
2022-04-25 08:13:27	I do agree that libjxl's codebase shouldn't use exceptions
2022-04-25 08:13:49	but it seems that removing `-fno-exceptions` and other related flag will bring only good things
2022-04-25 08:16:45	well except for slight but kinda significant size increase
2022-04-25 08:17:15	but i honestly don't see any other way solving aborting on malloc
2022-04-25 08:17:50	other than not using STL containers at all and never doing new(), only malloc
2022-04-25 08:22:26	id actually argue that speed increase alone is worth it, if proved by more serious benchmarking on machine with SMT disabled, clock speed locked & running nothing else

_wb_

2022-04-25 08:50:39

how is the size increase after stripping?

improver

2022-04-25 08:53:37

``` 3640752 libjxl-orig/build/libjxl.so 3743296 libjxl/build/libjxl.so ``` still increase

_wb_

2022-04-25 08:54:01

If size increase is the only real drawback, then we could keep no-exceptions for libjxl_dec (which is for size-critical applications) but remove it for libjxl, maybe?

veluca

2022-04-25 08:54:05	those sizes seem a bit surprising
2022-04-25 08:54:17	ah, right, the encoder is there too

_wb_

2022-04-25 08:54:38

Still a bit large, is this a release build?

veluca

2022-04-25 08:54:58

tbh, we can probably get 95% of the way there by properly failing instead of aborting for image buffers

improver

2022-04-25 08:55:14

``` 4547336 libjxl-orig/build/tools/djxl 4645640 libjxl/build/tools/djxl ```

veluca

2022-04-25 08:55:24

it's by far the worst offender for OOMs

_wb_

2022-04-25 08:55:51	Yes, image buffers are the only big allocs so the number of spots that would need to add a check could be limited
2022-04-25 08:56:36	In any case it would be nice if the library doesn't bring down the whole application in case of OOM, but returns error so the application can handle it gracefully

veluca

2022-04-25 08:57:25	tbh -fno-exceptions also has quite some problems with ODR and the STL which I'm not sure how to deal with
2022-04-25 08:57:46	(well, and in general using any other C++ lib)

_wb_

2022-04-25 08:58:33

Imagine working on a large image in gimp and then when you try to save, it makes gimp crash

lonjil

2022-04-25 09:02:29

> In any case it would be nice if the library doesn't bring down the whole application in case of OOM, but returns error so the application can handle it gracefully *cires in Linux*

Hello71

2022-04-25 09:03:16

you can disable overcommit if you close your eyes and pretend that fork doesn't exist

lonjil

2022-04-25 09:04:31	Technically speaking, regardless of your settings or the presence of fork(), Linux will always launch the OOM killer if the kernel needs more memory than is free.
2022-04-25 09:04:56	Disabling overcommit just makes it less likely to happen.

improver

2022-04-25 09:05:51	having proper OOM handling will open path to have user-supplied allocation callback working for all of allocations
2022-04-25 09:06:20	which would be handy for limiting libjxl's memory usage

Hello71

veluca Enabling exceptions making things faster is incredibly fishy

2022-04-25 09:12:04

i think 0.2% difference is totally plausible from minor code alignment changes. linux kernel has a DEBUG_FORCE_FUNCTION_ALIGN_64B option which does what it says on the tin. clicking around on the initial submission (32B) https://lore.kernel.org/lkml/1595475001-90945-1-git-send-email-feng.tang@intel.com/, changes of 2.5%, 5%, even 30% for one microbenchmark were seen. also https://users.encs.concordia.ca/~shang/soen691/2016winter/papers/producing-wrong-data-asplos09.pdf, https://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf

improver

2022-04-25 09:13:03

but it's change for the better though. someone else cares to benchmark on something else to see if it gets worse?

veluca

2022-04-25 09:13:31

mhhh, that'd be (I assume) because of changing what the branch predictor cache would do, I think, which would indeed explain this as IIRC modular is a lot branchier

Hello71

2022-04-25 09:19:13

i think there are a few factors, branch predictor cache is probably one, but also e.g. intel processors are (allegedly) very sensitive to jump target alignment, especially for tight loops. e.g. the first paper mentions intel loop stream detector, i think if your loop crosses some boundaries then it may get much slower. gcc does some alignment by default but iirc not full alignment (too much and you bloat the cache instead)

improver

2022-04-25 09:23:36	my CPU is AMD
2022-04-25 09:25:58	I don't reject that it may be because of difference between how alignment works, but imo it could also be because of how some failures in stdlib get handled (explicit checks versus leaving it for exceptions)
2022-04-25 09:26:25	im kinda hesistating to accept that hypothesis until consistent slowdown is shown on some other system/arch
2022-04-25 09:27:43	don't wanna trust "in theory it could be..." arguments because that's how `-fno-exceptions` usually gets into codebases in the first place

Hello71

2022-04-25 09:28:22	amd manual claims that loop alignment isn't necessary on ryzen, but i think gcc still does it
2022-04-25 09:34:33	i doubt libstdc++ does any "explicit checks" when exceptions are disabled. it doesn't use `__EXCEPTIONS`, and for `__cpp_exceptions` it seems to just call `__builtin_abort` instead of `throw`. if anything, it does more checks when `__cpp_exceptions` is defined, not less.

improver

2022-04-25 09:36:35

`throw` doesn't "do checks", that's the thing. it unwinds the stack, that's why it's slow. but the path when exceptions aren't raised is uncluttered

Hello71

2022-04-25 10:11:12

my point is that it specifically doesn't do `#ifdef __cpp_exceptions catch () #else if (rv < 0)`, it does `#ifdef __cpp_exceptions throw #else __builtin_abort()`

improver

2022-04-25 10:43:49

id need to dig into stdc++ because r n i dont really have enough data to argue either way tbh

lonjil

2022-04-26 06:10:57

I would like to note that how things are laid out in memory can have huge consequences on performance. Even having a different length username on Linux can have a noticeable performance impact. Most compiler switches which affect performance also affect layout, so simple comparisons don't actually tell you anything unless the recorded difference is large.

improver

2022-04-26 10:11:58	i just stubbornly wanna know if this makes perf worse for some machines but nobody other than me wants to test >_<
2022-04-26 10:13:10	but like it probably doesn't matter either way lol
2022-04-26 10:13:28	as the difference is quite tiny

lonjil

2022-04-26 10:26:22	It's not even a per machine thing. The true difference between having or not having exceptions on your machine may be opposite to what your benchmark showed, because the number of factors that may have changed between the two versions is huge.
2022-04-26 10:26:36	I can try to test it on my computer later though.
2022-04-26 10:27:09	This talk describes the problem I'm talking about: https://www.youtube.com/watch?v=r-TLSBdHe1A
2022-04-26 10:27:28	Perhaps we could try the tool they developed for benchmarking.

Hello71

2022-04-27 01:46:32	that guy is one of the co-authors on the second paper i linked
2022-04-27 01:47:01	and he is talking about the "Stabilizer" software from the paper

lonjil

2022-04-27 02:35:59

oops, that's what I get for not paying attention!

dds

2022-05-01 11:57:50

Tree leaves again: from observing generated trees, it frequently happens that leaf nodes have identical predictors to their siblings - especially as we have always offset == 0, multiplier == 1 We can't just merge these pairs of leaves as each leaf has its own distribution However after distribution clustering, some such sibling distributions may have been clustered together, meaning they share a context. Hence both can be merged into their parent (i.e. removing the parent's property decision). If correct: while this idea doesn't help with compression size (save for the tiny benefit of trimming some branches), the intended benefit is at the decompression end, where having a smaller tree may improve decompression speed Does this sound plausible?

_wb_

2022-05-01 04:13:59	Yes
2022-05-01 04:14:19	Would also save a few bits in signaling the MA tree, not very significant but still
2022-05-01 04:15:02	Could also apply this optimization decode side btw
2022-05-01 04:16:42	Decoder already has some optimizations to avoid branching in MA traversal, but it may not cover some cases like this

Traneptora

2022-05-02 03:36:40	is there a way to get better error messages out of `JxlEncoderAddImageFrame`?
2022-05-02 03:36:44	it's failing and I don't know why
2022-05-02 04:11:37	update, it appears it's expecting 3 channels, received only 1
2022-05-02 04:11:48	however, I set `JxlBasicInfo` with 1 color channel
2022-05-02 04:11:55	as well as the `JxlPixelFormat`
2022-05-02 04:11:59	so I'm not sure what's up there

_wb_

2022-05-02 05:11:18	Could be a bug on our end too
2022-05-02 05:11:56	Ah, what color encoding did you set?
2022-05-02 05:12:11	Maybe you didn't set a grayscale color encoding?

spider-mario

	lonjil This talk describes the problem I'm talking about: https://www.youtube.com/watch?v=r-TLSBdHe1A
2022-05-02 10:26:59	eek, frequentist statistics (18:55)

Traneptora

	_wb_ Ah, what color encoding did you set?
2022-05-02 10:54:28	I didn't set a color encoding
2022-05-02 10:55:23	I figured that setting 1 channel would imply grayscale
2022-05-02 10:59:52	however, `uses_original_profile = false` in this case, so I figured setting a color encoding was unnecessary
2022-05-02 11:00:32	I'm wondering if `uses_original_proifle = false` implies xyb internally, which gets confused if there's one channel
2022-05-02 11:00:59	might need an extra branch so it only assumes xyb if `num_channels = 3`

_wb_

2022-05-02 11:57:44	I guess we should make the implicit colorspace be grayscale when basicinfo has only 1 color channel
2022-05-02 11:59:38	It uses xyb for lossy grayscale too, just x and b are zero then

Traneptora

2022-05-02 12:00:19

that seems more sensible than assuming xyb and immediately failing tbh

dds

	_wb_ Could also apply this optimization decode side btw
2022-05-02 01:26:21	I did a quick test on the decode side as you suggested; this eliminates between 1-4% of the leaves, depending on the image. There's a tiny but detectable speed increase, 2.5% at best.

_wb_

2022-05-02 01:26:55	not bad!
2022-05-02 01:27:28	I mean, it's not a huge optimization of course, but every small bit helps
2022-05-02 01:28:43	so I think we should do this both encode and decode side, it's probably not a lot of code to do recursive bottom-up merging of identical siblings
2022-05-02 01:29:28	feel free to make a PR, I'll gladly review it

dds

2022-05-02 01:30:56	I had originally tried doing it on the encode side but it got messy (e.g. running BuildAndEncodeHistograms twice)
2022-05-02 01:34:02	I'm still focusing on trying to understand the spec so won't be making PRs in the near future (though the code is one tiny fn if you want me to just paste it)
2022-05-02 01:34:45	Is there a good way to submit feedback on the spec (the part1.pdf you posted a while ago) other than discord?

_wb_

2022-05-02 01:43:14	not really
2022-05-02 01:43:16	I mean
2022-05-02 01:43:42	the official way is that you contact your national standardization body so they can submit a defect report to ISO
2022-05-02 01:44:11	which we then receive by the next JPEG meeting if we're lucky
2022-05-02 01:44:57	and then we have to respond to the defect report and propose an Errandum, Corrigendum or Amendment to the spec
2022-05-02 01:45:13	which can then eventually get consolidated in a next edition
2022-05-02 01:47:38	the unofficial way is that you just tell me, I fix it on the spec repo if needed, and either it gets fixed in the current 2nd edition (it's in CD stage now, so still some stages before it's done) or it will be on the stack of pending fixes for an eventual 3rd edition
2022-05-02 01:49:41	this is the actual CD text by the way, which was approved last week. it should be made public at some point by ISO but they can be slow with that - they're only fast when it is to remove stuff and put it behind a paywall, making things public is slower...
	dds I had originally tried doing it on the encode side but it got messy (e.g. running BuildAndEncodeHistograms twice)
2022-05-02 01:53:26	a PR just for the decode side is also nice, you're right that doing it encode side might be a bit messier since I guess the clustering gets done after writing the MA tree atm...

dds

2022-05-02 01:55:20	thanks for the pdf
2022-05-02 01:56:14	My main criticism of the spec is that it needs much more of an adversarial perspective. It tells you what jxl files look like, but it's less clear how to decode an arbitary bitstream as a jxl file (or not).
2022-05-02 02:01:38	For instance the modular_16bit_buffers field: what does it mean for 16-bit integers to be large enough to store and apply inverse transforms to <particular data>? Different implementations may cause the intermediates to grow by different amounts. Is a confirming decoder expected to detect whether or not the field is 'lying'?

_wb_

2022-05-02 02:04:12	these kinds of specs only say how to decode a valid file. What to do with an invalid file is outside the scope of the spec, and is considered a 'quality of implementation' issue — as far as the spec is concerned, the decoder can crash and explode on bad input.
2022-05-02 02:04:45	but you may be right that it's not always clear enough what a valid file is

dds

2022-05-02 02:05:27

But if the spec doesn't tell you the difference between valid file and a not valid file, how do you tell the difference between creative use of the file format and a non-conforming file?

_wb_

2022-05-02 02:06:30	whenever the spec says something like "this value is smaller than 10", it means that if it is not smaller than 10, it's not a valid bitstream and the decoder behavior becomes undefined
2022-05-02 02:09:00	modular_16bit talks about the numbers you get in image buffers (both before and after transforms). We assume that implementations can make sure to do arithmetic (e.g. to compute a predictor) with wide enough integers when needed, but the buffers themselves should fit in int16_t if modular_16bit is true
2022-05-02 02:10:04	and no, a decoder is not expected to detect when the field is lying, it can just produce a glitched image where the pixels look very different (due to overflows) from what you would get when using infinite-precision numbers as in the spec
2022-05-02 02:10:29	because when the field is lying, the bitstream is not valid, so the decoder can do whatever it wants
2022-05-02 02:10:57	you can make a decoder that detects such cases, refuses to decode the image, and gives a nice error message about it
2022-05-02 02:11:10	you can make a decoder that just displays a glitched image
2022-05-02 02:11:32	you can make a decoder that makes your computer explode
2022-05-02 02:11:43	as far as the spec is concerned
2022-05-02 02:14:53	as far as the libjxl implementation is concerned, we of course don't want security bugs, but we don't really care what happens on bad input as long as it's not a security problem: both refusing to decode and decoding a glitched image are OK — and in cases like integer overflow, checking this explicitly would be too much of a slowdown (it would be at least one extra branch per pixel, probably more in case of transforms) so we'll tend to do the latter
2022-05-02 02:15:52	refusing to decode is preferred when the condition can be checked outside hot loops, like header-level stuff.
2022-05-02 02:18:06	files like this are not valid at all: https://discord.com/channels/794206087879852103/824000991891554375/927220411383554121
2022-05-02 02:19:41	https://jxl-art.surma.technology/?zcode=c8osSUktKMlQMDbk4tJVCFcwNQADLgA
2022-05-02 02:21:21	something like that should according to the spec just decode to something that gets brighter and brighter as you southeast, but it's OK that our decoder doesn't do that (due to int32_t overflow) because the spec says that values have to be representable in int32_t, so this is not a valid file
2022-05-02 02:22:20	(it's also not valid because the level 10 box is missing and it has a bitdepth that is too high for level 5, so it's not properly marked)
2022-05-02 02:24:31	strictly speaking files like that are not jxl files, they are just malformed codestreams that happen to decode because the decoder happens not to check, which is an OK strategy as far as the spec is concerned
2022-05-02 02:25:10	of course nobody should rely on that file decoding to those pixels

dds

2022-05-02 02:38:11	To be clear then: a conforming decoder is not required to detect or reject an invalid jxl file.
2022-05-02 02:38:22	For modular_16bit_buffers: would it be better to say instead "modular_16bit_buffers indicates whether or not all the decoded samples in modular image sub-bitstreams fit in signed 16-bit integers" - then the spec doesn't have to make assumptions of how unknown implementations do their arithmetic and integer carries?

_wb_

	dds To be clear then: a conforming decoder is not required to detect or reject an invalid jxl file.
2022-05-02 04:02:48	Correct. A conforming decoder is required to correctly decode every valid jxl file. What it does with invalid files is undefined.
	dds For modular_16bit_buffers: would it be better to say instead "modular_16bit_buffers indicates whether or not all the decoded samples in modular image sub-bitstreams fit in signed 16-bit integers" - then the spec doesn't have to make assumptions of how unknown implementations do their arithmetic and integer carries?
2022-05-02 04:08:19	No, because the decoded samples (i.e. after undoing all transforms) could fit in int16_t, but int16_t buffers could still not be enough even disregarding arithmetic. For example, the decoded samples could all be nicely in the -2^15 to 2^15-1 range, but if the RCT or Squeeze transform was used, the values _before_ inverse transforms could be outside that range. These transforms can only be undone after all channels are decoded, so there is no way in that case that an implementation can avoid having to buffer values that are outside the int16 range.
2022-05-02 04:09:37	so it's about the values that come straight out of entropy coding, the values you get after undoing the first transform, the values you get after undoing the second transform, etc. until the final output values. They all have to fit in int16_t.
2022-05-02 04:12:37	Now it is true that implementations might be able to avoid some of these intermediate values, e.g. you could in principle do a 100,000-color palette (so the palette indices are outside int16_t range) without actually storing the palette indices, immediately undoing the palette transform as you decode. But we don't want to assume that implementations will do such optimizations (it's very tricky to do that in general anyway), so this would still violate the modular_16bit_buffers criterion since at least a naive implementation (as well as the current libjxl) would have to store values that exceed the int16_t range.
2022-05-02 04:13:45	(note: libjxl currently never actually uses int16_t buffers, it always uses int32_t buffers, so it doesn't really care about modular_16bit_buffers. But in principle we may want to have a memory-optimized implementation in the future, or a hardware implementation, that does use 16-bit buffers)
2022-05-02 04:18:48	How arithmetic is done, there we do assume that it is done with sufficient precision. E.g. for the AvgW+N predictor, we do assume that the (W+N)/2 computation is done without having an overflow in the W+N (so these computations are done in int64_t in current libjxl). Modular_16bit_buffers is only about the buffer precision, the precision of arithmetic is assumed to be infinite in the spec: everything is defined using mathematical numbers, i.e. "real" numbers (which have a strange name because no real implementation can actually really use those)

dds

2022-05-02 06:13:30

Ok - so "When modular_16bit_buffers is true, it indicates that when inverse transforms are applied to the decoded samples in modular image sub-bitstreams, the inputs and outputs of each successive transform can all be represented by signed 16-bit integers." which deliberately doesn't assign meaning to modular_16bit_buffers=false.

2022-05-02 06:17:49

This was much lower down my list but since you mention 'infinite precision'... I assume that 0.7071067811865476 in the AFV matrix is meant to be sqrt(2)/2 ? To a few more decimal places, sqrt(2)/2 is 0.7071067811865475244. Whichever way you do the rounding, you can get 0.7071067811865475 and you can get 0.7071067811865476, but you can't get 0.7071067811865474 (which occurs in the AFV matrix, negated). My guess would be that this is a rounding error on the part of the arithmetic package that generated the constant from cos(pi/4) as opposed to something germane to how AFV works. The real point here is: Why not define the AFV basis by what it really is (i.e. with infinite precision) as you have done with the more widely-known DCT? This saves implementors from having to reverse-engineer it to take advantage of any structure present. If the JXL spec isn't the place to define how AFV works then adding a reference to the algorithm would be ideal. Similar comments for the magic constants elsewhere in the spec where appropriate; there are quite a few of them.

_wb_

2022-05-02 06:47:11	I think most of such magic constants are just taken from the code and given in the spec with enough decimals to be the same binary32 float number
2022-05-02 06:47:39	But yes, I agree that giving them mathematically would have been nicer
2022-05-02 06:47:57	We did that in some places, like in splines iirc
2022-05-02 06:48:38	But in most places it's just magic numbers with a seemingly arbitrary number of digits
2022-05-02 06:50:10	I think in some cases there just are no mathematical numbers, i.e. the constants were the result of some numerical optimization using somewhat arbitrary objective functions (like butteraugli scores on some corpus)
2022-05-02 06:55:34	For AFV, I don't really know the mathematical derivation. It's a dct8x8 with 3 corner pixels ignored for the dct and coded separately, if I understand correctly. <@179701849576833024> / <@532010383041363969> is there a mathematically nicer way to formulate the AFV transforms than with a big bag of magic constants?

veluca

2022-05-02 07:00:38

Yes but I don't remember what way it is 😂

_wb_

2022-05-02 07:09:03	Wdyt about trying to replace magic constants with the mathematically exact version?
2022-05-02 07:10:18	Doesn't really make much difference in practice since we give numbers with quite high precision, but it does look nicer and corresponds more with how we define the normal DCT

veluca

2022-05-03 10:27:35	I can try and ask Thomas
2022-05-03 10:28:58	it could be that they just are "root of 16th degree polynomial near 0.001", which is not incredibly useful as a closed form, though

dds

2022-05-03 11:32:54

AFAICT, AFV starts off with something like a 2x2 DCT of 3 pixels and an 4x4 DCT of 13 pixels with redundant coeffs removed - i.e. what <@794205442175402004> said. Though there's a linalg step afterwards that does some strange things and mixes the two sub-DCTs more than seems necessary. See e.g. columns 1, 5 of the forwards AFV - both have 11 identical entries at the end which tells me that they're trying to be a coeff of the 2x2 DCT coeff but have a multiple of vector 0 (the DC) added. However you only need to add DC to one of them to satisfy the 'single 8x8 DC coeff' requirement - right?

2022-05-03 11:32:58

Did you plug the 16+4 vectors into a linalg package and ask "Give me a normalised basis for these 68 vectors that contains the DC coeff" ? (edited: fixed failure to count)

veluca

2022-05-03 07:10:35	the real answer is "this was way too long ago for me to remember"
2022-05-03 07:10:43	but I'll see if we can reconstruct it

dds

2022-05-03 07:31:57

thanks I'd be interested - IMO reducing opacity is good for both the spec and its implementations

_wb_

2022-05-03 07:36:29

It's also good in case someone wants at some point to make a 16x16 AFV for some future codec or something like that.

Traneptora

2022-05-04 12:50:31	just to be clear, `JxlEncoderSetColorEncoding` sets the description of the pixel data passed, right?
2022-05-04 12:53:00	also, I'm getting that libjxl is returning sRGB no matter what
2022-05-04 12:53:08	even though the input is PQ and Rec.2100
2022-05-04 12:55:58	is libjxl synthesizing an ICCP?

_wb_

	Traneptora just to be clear, `JxlEncoderSetColorEncoding` sets the description of the pixel data passed, right?
2022-05-04 01:36:51	right
	Traneptora is libjxl synthesizing an ICCP?
2022-05-04 01:36:59	yes
	Traneptora also, I'm getting that libjxl is returning sRGB no matter what
2022-05-04 01:37:58	even if you set preferred color profile to Rec.2100 PQ?

Traneptora

	_wb_ even if you set preferred color profile to Rec.2100 PQ?
2022-05-04 01:41:59	didn't you say that libjxl always outputs sRGB data if an ICCP is present?

_wb_

2022-05-04 01:42:44

for Rec.2100 PQ the encoder doesn't need to store an ICCP, it can use the enum

Traneptora

2022-05-04 01:42:57

but it *is* outputting an ICCP

_wb_

2022-05-04 01:43:13

are you using lossless?

Traneptora

2022-05-04 01:43:18

VarDCT XYB

_wb_

2022-05-04 01:44:00

strange, then it should detect that your input icc profile corresponds to rec2100 PQ and use the enum to represent it without spending bytes on the icc profile

Traneptora

2022-05-04 01:44:02
2022-05-04 01:44:11	jxlinfo properly reports this
	_wb_ strange, then it should detect that your input icc profile corresponds to rec2100 PQ and use the enum to represent it without spending bytes on the icc profile
2022-05-04 01:44:46	there is no input ICC Profile
2022-05-04 01:44:48	I set the enums
2022-05-04 01:44:50	on encode
2022-05-04 01:45:00	and then on decode I get back sRGB and an ICC Profile

_wb_

2022-05-04 01:45:05

Color space: RGB, D65, sRGB primaries, 709 transfer function, rendering intent: Perceptual

Traneptora

2022-05-04 01:45:14

_wb_

2022-05-04 01:45:37

that's what the enum says in that file

Traneptora

2022-05-04 01:45:45	just to double check that I uploaded the right file `ec57f9e37a41e638f1a69b043b327f95`
2022-05-04 01:45:49	md5sum, correct?

_wb_

2022-05-04 01:46:06

ca2dca6039c4709655ece9c2f2eb2f26

Traneptora

2022-05-04 01:46:10	oh
2022-05-04 01:46:30

_wb_

2022-05-04 01:46:47	oh wait
2022-05-04 01:46:59	I already had a test.jxl so it saved it as test (1).jxl, lol

Traneptora

2022-05-04 01:47:07

``` $ jxlinfo test.jxl JPEG XL image, 3840x2160, lossy, 16-bit RGB Color space: RGB, D65, Rec.2100 primaries, PQ transfer function, rendering intent: Relative frame: full image size ```

_wb_

2022-05-04 01:47:43

Color space: RGB, D65, Rec.2100 primaries, PQ transfer function, rendering intent: Relative

Traneptora

2022-05-04 01:47:47	yup
2022-05-04 01:47:54	it appears libjxl decodes this as sRGB and synthesizes an iCCP

_wb_

2022-05-04 01:47:56

yes so that looks good

Traneptora

2022-05-04 01:48:10

I'm not sure why

_wb_

2022-05-04 01:48:10	are you setting a preferred colorspace when decoding?
2022-05-04 01:48:20	and are you decoding to uint?

Traneptora

2022-05-04 01:48:28

I'm decoding to uint16

_wb_

2022-05-04 01:48:54	right, it would actually make more sense if it would just decode to rec2100 PQ when decoding to uint16
2022-05-04 01:49:19	when decoding to uint8, it's not crazy to default to sRGB, but when decoding to uint16 there is enough bit depth

Traneptora

2022-05-04 01:49:36	and I didn't set a preferred colorspace, I first asked libjxl if it had an ICCP, and it said "yes" so I said "cool" and decoded the pixel data to sRGB and attached that ICCP
2022-05-04 01:49:40	`target_data` btw

_wb_

2022-05-04 01:49:59

I think current behavior, if you don't set a preferred colorspace, is to just always decode to sRGB when decoding to uint and to linear sRGB when decoding to float

Traneptora

2022-05-04 01:50:15	if the bitstream has no ICCP and has enum values, shouldn't it just decode to that
2022-05-04 01:50:24	and set the enum values appropriately
2022-05-04 01:50:45	or at least, do that by default
2022-05-04 01:51:21	the only reason I'm getting an ICCP is cause I asked if there was one and libjxl returned "yes"

_wb_

2022-05-04 01:51:59	yeah, maybe... I dunno, when decoding to uint8 it might be better to clamp away the wide gamut and bright stuff and return it as sRGB instead of putting a rec2100 pq in uint8 which will be a quite bandy experience
2022-05-04 01:52:08	how do you ask if there is an ICCP?

Traneptora

2022-05-04 01:53:04	`JxlDecoderGetICCProfileSize`
2022-05-04 01:53:06	check if nonzero

_wb_

2022-05-04 01:53:07

JxlDecoderGetICCProfileSize will always return something non-zero, because libjxl will just synthesize something if needed

Traneptora

2022-05-04 01:53:18

ah.

_wb_

2022-05-04 01:53:34

so that documentation is pretty misleading

Traneptora

2022-05-04 01:53:46

yes it is

_wb_

2022-05-04 01:54:06	the only way it will return zero, is if the image is tagged with the enum to have an `Unknown` colorspace
2022-05-04 01:54:25	but there is no encoder atm that will produce such files
2022-05-04 01:54:59	and it's probably not something you ever want to do except in very niche cases where interoperability is not an issue

Traneptora

2022-05-04 01:56:50	It sounds like, I should first check `JxlDecoderGetColorAsEncodedProfile` first and if I like the color_encoding returned (i.e. I can work with it), then I should pass it to `JxlDecoderSetPreferredColorEncoding` and then handle the pixel data returned.
2022-05-04 01:57:47	and if I don't like that color encoding, or there's an embedded ICCP, then instead retrieve the ICCP and use sRGB instead.

_wb_

2022-05-04 01:58:10

yes, except it's annoying that you get JxlDecoderGetColorAsEncodedProfile as an icc profile while JxlDecoderSetPreferredColorEncoding needs an enum space as input

Traneptora

2022-05-04 01:58:35	JxlDecoderGetColorAsEncodedProfile returns an enum space
2022-05-04 01:58:51	or rather it populates a `JxlColorEncoding` struct, moe specifically.

_wb_

2022-05-04 01:58:53

ah right

Traneptora

2022-05-04 01:59:22

cause the docs seem to imply that you should always check for ICC Profile first, and then only fall back on EncodedProfile if the ICC Profile fails.

_wb_

2022-05-04 01:59:44

ugh we use color profile, color encoding and color space in such an inconsistent way in those function names

Traneptora

	Traneptora It sounds like, I should first check `JxlDecoderGetColorAsEncodedProfile` first and if I like the color_encoding returned (i.e. I can work with it), then I should pass it to `JxlDecoderSetPreferredColorEncoding` and then handle the pixel data returned.
2022-05-04 02:00:11	in either case, is this the best plan of action if I support a variety of color spaces?

_wb_

2022-05-04 02:00:18	yeah it's the other way around, there's always an ICC profile but not always an equivalent enum
2022-05-04 02:01:37	the only annoying thing is that in ICC case, you can get clamping when decoding to uint if the ICC profile has a wider gamut than sRGB

Traneptora

2022-05-04 02:04:53	If I do this: ```c jret = JxlDecoderGetColorAsEncodedProfile(ctx->decoder, NULL, JXL_COLOR_PROFILE_TARGET_ORIGINAL, &jxl_encoding); if (jret == JXL_DEC_SUCCESS) { JxlDecoderSetPreferredColorProfile(ctx->decoder, &jxl_encoding); ``` will the result of `JXL_COLOR_PROFILE_TARGET_DATA` match `TARGET_ORIGINAL`?
2022-05-04 02:17:37	or does that still depend on the `jxl_pixfmt`

_wb_

2022-05-04 02:17:37

it should, yes

Traneptora

2022-05-04 02:17:52	I'll run it like this
2022-05-04 02:18:08	```c jret = JxlDecoderGetColorAsEncodedProfile(ctx->decoder, NULL, JXL_COLOR_PROFILE_TARGET_ORIGINAL, &jxl_encoding); if (jret == JXL_DEC_SUCCESS) jret = JxlDecoderSetPreferredColorProfile(ctx->decoder, &jxl_encoding); if (jret == JXL_DEC_SUCCESS) jret = JxlDecoderGetColorAsEncodedProfile(ctx->decoder, &ctx->jxl_pixfmt, JXL_COLOR_PROFILE_TARGET_DATA, &jxl_encoding); ```

dds

2022-05-05 11:07:53	I said that I thought the spec needed more of an adversarial perspective.
2022-05-05 11:08:09	Re: Annex M ("profiles and levels"), here's a example of a jxl file that I _think_ is a valid Level 5:

_wb_

2022-05-05 11:56:22	yes, the code actually has a safety check to limit the maximum spline area (amount of pixels being painted), which is a better limitation than just limiting the number of splines (since a spline can cover just a few pixels but it can also cover basically the entire image)
2022-05-05 11:59:06	question is though where to draw the line (no pun intended), setting the limit too low can result in an annoying constraint for future encoders or things like svg2jxl, setting the limit too high results in images that take a disproportional amount of time for their size (since pixels could end up needing to be painted a _lot_ of times)
2022-05-05 12:00:11	so for now we didn't put a strict limit in the profile yet, but I think we should

dds

2022-05-05 12:31:52

I'm also wondering if "Maximum total number of pixels in consecutive zeroduration frame" lets you squeeze in arbitrarily many reference frames. The default frame duration is 0 (suggesting that you can't) but the spec also says that a reference frame "is not itself part of the decoded sequence of frames", suggesting that you can. To me it's ambiguous. I think Annex M disallows certain uses of modular mode that are creative but still result in fast decodes. For instance I was pondering a while ago using a reference frame with a very large number of channels - but with each channel only the fraction of the size of the output image. This wouldn't be allowed under the current Level 5, even though the total pixel area is roughly the same as a 'normal' image. On the other side, I think it's possible to make an image that is accepted by Level 5 but still has a crazy slow decode time - e.g. by using pathologically annoying MA trees, lots of splines and blended patches. FWIW the way I'd do Annex M would be to choose constants A and B, set a budget of width * height * A + B and assign a rough cost to each step of the decoding process that's used. The point being that upon receiving an image, you can read the header to examine its width and height. A reasonable user will accept a decoding time that's linear in the image size, so you can then decide whether or not to proceed. The advantage of this approach is that it allows almost free use of modular mode as long as it's fast, i.e. any image that decodes quickly should be accepted by Level 5. A possibly heretical suggestion for Level 5: insist that all parent nodes of a weighted property / predictor node are 'channel' or 'group' properties, i.e. the weighted predictor must be in a tree of essentially size 1.

_wb_

2022-05-05 12:47:52	max total number of pixels in consecutive zero duration frames does include frames of any type, including kReferenceOnly and kSkipProgressive. We should probably clarify that. It's about how much effort the decoder has to do.
2022-05-05 12:50:48	the way we use weighted predictor in cjxl -e 3 is with a fixed tree that uses the weighted error as context. it wouldn't satisfy that criterion, but it's still a tree that can be evaluated without any branching, it's just a lookup
2022-05-05 12:56:33	I like the idea of limiting things based on something like total number of samples to process (so extra channels at 1:8 resolution are not that much of a problem), where "process" means entropy decode, undoing a transform, painting a spline, blending a patch, etc.
2022-05-05 12:59:50	it's somewhat tricky to formulate things in a way that is 1) easy to check as early as possible in advance, 2) easy to write down in the spec, 3) doesn't assume too much about how things are implemented, 4) robust to adversarial examples

dds

2022-05-05 01:03:40	yes it is
2022-05-05 01:04:07	I think some things could be checked as you go along - e.g. for the MA trees you could have a limit on tree size but also a limit on the number of traversals below 'c' and 'g' properties at the root (e.g. an average of 4 per pixel). Then a decoder keeps count as it goes along and MAY abort the decode if the cost gets too high.

_wb_

2022-05-05 01:04:43

e.g. having 64 extra channels at 1:8 resolution is only really one extra channel in terms of samples, so we could allow that... but then if those extra channels end up getting used in a later frame as an alpha channel for patch blending, they do in fact need to get upsampled by the decoder to image resolution, and while an advanced implementation can do that on-demand and in a memory-limited way, a more naive implementation could end up using 64 full-sized buffers...

perk

2022-05-07 09:08:27	cjxl fails to encode this gif
2022-05-07 09:08:30	only the first frame renders
2022-05-07 09:14:25	cjxl v0.7.0 ddbf206 [AVX2,SSE4,SSSE3,Scalar] cjxl -e 9 -E 3 -I 1

_wb_

2022-05-07 09:32:30

What does jxlinfo say?

perk

2022-05-07 10:07:43

it says that the frames are there, but they're 0 ms

_wb_

2022-05-07 10:08:55	oh
2022-05-07 10:09:40	maybe the gif actually has 0 ms frames too, but you don't notice because most viewers enforce a minimum frame duration on gifs
2022-05-07 10:18:07	yep, that's it

perk

2022-05-07 10:20:27

hmm, interesting, thanks

_wb_

2022-05-07 10:21:03	https://webmasters.stackexchange.com/questions/26994/why-is-this-gifs-animation-speed-different-in-firefox-vs-ie
2022-05-07 10:21:21	technically cjxl just interprets the input literally
2022-05-07 10:21:44	perhaps it would make more sense to make it emulate whatever chrome does, or something
2022-05-07 10:23:18	https://wunkolo.github.io/post/2020/02/buttery-smooth-10fps/
2022-05-07 10:23:26	it's a bit of a mess, it seems
2022-05-07 10:23:48	browsers nowadays allow up to 50fps (0.02s frame duration)
2022-05-07 10:24:22	BUT if you set the duration to something smaller than that (i.e. 0.01s or 0s), they will not play it at 50 fps but at 10 fps
2022-05-07 10:24:46	probably for some kind of legacy reasons
2022-05-07 10:25:28	then some players will just play 0.01s at 100 fps and 0s at whatever the fastest is they can do (using max cpu)

perk

2022-05-07 10:48:39	I see. I went and made the gif 60 fps, and the converted jxl is animated and frame limited in browsers as well.
2022-05-07 10:52:44	If high framerate gifs are widespread, it might make sense to have default behavior explicitly lower the framerate, if only to make playback the same on all platforms. But that in itself could be treated as bugged encoding. Dunno

improver

2022-05-07 10:55:49	id say the default should be to emulate browsers
2022-05-07 10:55:57	with switch to take stuff literally
2022-05-07 12:29:36	i wonder how apng encoders/decoders interpret it

SleepyJoe

2022-05-07 02:09:03	Yes
2022-05-07 02:54:34	How can we run a specific test? I've read the building and testing doc, and tried running ./ci.sh test jxl_jni_decoder_wrapper as well as ctest jxl_jni_decoder_wrapper in the build dir, but in both cases all the tests are executed, not only the one I specified

_wb_

2022-05-07 03:03:36

You can run a specific executable in build/lib/tests

BlueSwordM

2022-05-08 03:35:16	Oh boy, this does not seem to good: https://github.com/BtbN/FFmpeg-Builds/commit/22eebe8ea63df58eca455ea18a15c0a6f9fc26f7
2022-05-08 03:35:27	I don't think this is supposed to happen at all.
2022-05-08 03:37:33	https://github.com/BtbN/FFmpeg-Builds/issues/151
2022-05-08 03:37:42	I'll look at the issue report and try to reproduce it.

_wb_

2022-05-08 03:56:07

Probably an msvc bug

BlueSwordM

	_wb_ Probably an msvc bug
2022-05-08 04:09:21	It's ffmpeg though, so mingw is being used.
2022-05-08 04:09:37	I'll check with git libjxl and git ffmpeg to see if I can reproduce the issue.

_wb_

2022-05-08 04:46:46

mingw is gcc right?

improver

2022-05-08 04:46:46

<https://github.com/shinchiro/mpv-winbuild-cmake/blob/master/packages/libjxl.cmake#L38-L39> heh

_wb_

2022-05-08 04:47:19	Only scalar is in any case a bit overkill
2022-05-08 04:47:33	SSE3 might be fine

BlueSwordM

	_wb_ mingw is gcc right?
2022-05-08 04:51:05	Default is GCC yes. You can technically build mingw with Clang, but in 95% of cases, mingw= GCC.
2022-05-08 04:53:50	What worries me is that both lossless and lossy work fine on my end in ffmpeg.

veluca

2022-05-08 06:48:55	greeeeat
2022-05-08 06:48:58	what's your CPU?

BlueSwordM

	veluca what's your CPU?
2022-05-08 06:56:03	Zen 2 3700X.

_wb_

2022-05-08 06:59:11

Always fun to debug stuff that happens only on certain environments/machines

veluca

2022-05-08 07:02:01

I am currently suspecting that some hwy feature detection misfires in some cases

_wb_

2022-05-08 07:05:59

That would explain things, yes.

Basketball American

2022-05-11 10:51:44

i think its related to https://github.com/libjxl/libjxl/issues/1282 ? i downloaded d98c707 from https://artifacts.lucaversari.it/libjxl/libjxl/ ran it, no output, hooked up a debugger and breakpointed on the cpuid instruction and flipped the avx2 bit to 0 and got output it also works on my older machine in windows which doesn't support avx2

dds

2022-05-11 12:48:36	<@179701849576833024> Any luck tracking down the derivation of the AFV matrix? I had hoped it would just be 3 rows of a 2x2 DCT and 13 rows of a 4x4 DCT with sensible choices for the missing 1 + 3 coeffs (modulo massaging in a single DC coeff and an orthogonalisation step).
2022-05-11 12:48:41	However after looking at it for a bit, I don't think this is the case.

veluca

2022-05-11 12:49:46

the person I wanted to ask is OOO, remind me again next week? 🙂

dds

2022-05-11 12:49:51	sure
2022-05-11 12:50:19	The first two rows of the AFV matrix M are
2022-05-11 12:50:24	`[0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ]` `[0.87690293 0.220651811 -0.101400504 -0.101400504 0.220651811 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 -0.101400504 ]`
2022-05-11 12:50:43	This matrix is the output of a QR decomposition (right...?) meaning that LM == X for some lower-triangular matrix L and the original matrix X.
2022-05-11 12:51:15	The first row of the original X is just the DC - fine. The second row of X must be some linear combination of the first two rows of M. It looks like it wants to be a coeff of the 2x2 DCT, i.e. mostly zeroes. So if we use the first row to zero out all the -0.101... coeffs and then scale to get a 1 somewhere then we get `[3.037715891 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]` For a normal 2x2 DCT all the coeffs are 1 and sqrt(2) (modulo scaling), so the ratio between any two coeffs is at worst a multiple of sqrt(2). However the 3.037715891 doesn't look like it fits into this.

Jyrki Alakuijala

	dds <@179701849576833024> Any luck tracking down the derivation of the AFV matrix? I had hoped it would just be 3 rows of a 2x2 DCT and 13 rows of a 4x4 DCT with sensible choices for the missing 1 + 3 coeffs (modulo massaging in a single DC coeff and an orthogonalisation step).
2022-05-11 01:35:54	AFV: The three corner pixels are a simple separate ad hoc transform. The remaining 13 pixels are a normalized orthogonal transform inspired by the DCT -- however, not 4x4 DCT, nor hadamart.

dds

2022-05-11 02:05:11

<@532010383041363969> thanks, I'm glad I hadn't missed something obvious. Would you mind posting the original matrix?

Traneptora

2022-05-12 12:59:40	is there a `#include <libjxl/common.h>` to pull in things like `JxlPixelFormat` or `JxlColorEncoding`?
2022-05-12 12:59:56	that I figure would be included by the decoder or the encoder

veluca

	dds <@532010383041363969> thanks, I'm glad I hadn't missed something obvious. Would you mind posting the original matrix?
2022-05-12 11:04:32	so I have some more info: the matrix is obtained by "gluing together" eigenvectors of two 3x3 and 13x13 matrices - which implies that getting a nice formula for most of those coefficients is... extremely unlikely

dds

2022-05-13 07:55:25	Great, do you have details of how the 3x3 and 13x13 matrixes are derived? I don't think that neat closed expressions for the coefficients are realistic - normalisation alone poses enough of a barrier to this. The eigenvector construction sounds like a process that could be documented for someone to reproduce if they so wish.
2022-05-13 07:55:30	FWIW for the 13x13 I think I can see butterflies across the diagonal and three 4x4 sub-transforms, but it really should be documented properly.

veluca

	dds FWIW for the 13x13 I think I can see butterflies across the diagonal and three 4x4 sub-transforms, but it really should be documented properly.
2022-05-13 08:16:41	it wasn't done that way 😄
2022-05-13 08:16:56	I'm not 100% sure of the details myself, I'll get back to you

dds

2022-05-13 08:20:31

thanks

veluca

2022-05-13 09:01:13	so tldr is: create a 4x4 grid graph, "unlink" a corner (cells (0, 0), (0, 1) and (1, 0)) from the rest, write down the laplacians of the two connected components
2022-05-13 09:01:54	eigenvectors of those two things are how the AVF basis was created

dds

2022-05-13 10:00:47	hmm
2022-05-13 10:00:59	```from numpy import * from numpy.linalg import eig m = matrix([[2, -1, -1], [-1, 1, 0], [-1, 0, 1]]) evals, evecs = eig(m) print(evecs) # eliminate component 2 x = evecs[1] + evecs[2] x /= x[0,1] # compare with [3.037715891, 1, 0] from AFV print(x)```
2022-05-13 10:01:01	is that right?
2022-05-13 10:02:25	and is your description consistent with what <@532010383041363969> said? i'm willing to believe it is but the connection isn't obvious to me

veluca

2022-05-13 11:06:05

well, I guess actually it should be the eigs of the 16x16 matrix (although how disambiguated, I don't know)

Jyrki Alakuijala

2022-05-13 03:22:00

From my point of view it was necessary for the 13 other values to be orthonormal and different from dct so that we have an alternative transform rather than more of the same -- usually I would have used optimization to find such values, whatever produces best results on photographs, but here we happened to have a more theoretical approach handy, so to save time we just used those values

dds

2022-05-13 06:10:39	i can repro it except for the top corner
2022-05-13 06:10:59	was that ratio a 'whatever produces the best results' number?

veluca

2022-05-13 06:31:01	honestly? no clue
2022-05-13 06:31:09	I don't remember
2022-05-13 06:31:46	fortunately it only affects 3 vectors, so not too bad (and one of them is the all-equal basis vector, which is simple enough :P)

vtorri

2022-05-15 05:40:49	hello
2022-05-15 05:41:29	to add a new framework supporting libjxl, should I just provide a PR for README.md ?
2022-05-15 05:45:17	hmm, software_support.md actually

_wb_

2022-05-15 05:48:01

Yes

vtorri

2022-05-15 06:06:15	done
2022-05-15 06:27:32	<@794205442175402004> i guess you don't review on sunday ?

_wb_

2022-05-15 06:28:21

Oops

vtorri

2022-05-15 07:13:35

thank you

Jyrki Alakuijala

	dds i can repro it except for the top corner
2022-05-16 08:48:48	The top corner was something I and Luca haggled/hacked/kludged (and I tweaked the quantization coefficients afterwards) -- we wanted to have some independence for defining the 3 values. IIRC, we had very little concern of theoretical principles around the three pixels. The idea was that another object is peeking into the transform and often only covering the corner pixels, but sometimes extending to one or two of its 4-connected neighbours
2022-05-16 08:49:46	because of that idea (IIRC) we had the corner pixel as a separate coefficient
2022-05-16 08:50:08	(I'm writing this from my fantasy on how it should have been -- didn't check the code/spec) 🙂

ziemek.z

2022-05-16 08:51:11

Sorry to interrupt you Jyrki, but it just me or are pipelines failing in everyone else's forks, too?

Jyrki Alakuijala

2022-05-16 08:53:28	checking
2022-05-16 08:55:05	what is the command line that doesn't work?
2022-05-16 08:55:12	or something on github is broken?

ziemek.z

2022-05-16 08:56:19

Failing pipelines are: `arm-linux-gnueabihf`, `aarch64-linux-gnu` (both normal and lowprecision), `vcpkg / x64-windows-static`, `vcpkg / x86-windows-static`.

Jyrki Alakuijala

2022-05-16 08:56:26	for me linting with './ci.sh lint' stopped working -- but I manage to paint myself in a corner regularly with git/github
2022-05-16 08:57:02	Eugene will look at the failing pipelines

eustas

2022-05-16 09:05:14

Going to look into ASAP

veluca

	Jyrki Alakuijala The top corner was something I and Luca haggled/hacked/kludged (and I tweaked the quantization coefficients afterwards) -- we wanted to have some independence for defining the 3 values. IIRC, we had very little concern of theoretical principles around the three pixels. The idea was that another object is peeking into the transform and often only covering the corner pixels, but sometimes extending to one or two of its 4-connected neighbours
2022-05-16 09:09:40	I'm pretty sure the top corner is still obtained that way, looking at the code, except rotated differently
2022-05-16 09:09:45	how we did that I don't recall

dds

2022-05-17 09:58:41

<@179701849576833024> when you say 'code' I'm not sure whether you mean libjxl or the original code that generated AFV... but in either case I couldn't manage to get that coeff out of the eigenvectors of the graph (0,1) <-> (0,0) <-> (1,0)

veluca

2022-05-17 09:59:48	no no it shouldn't be eigs of that graph
2022-05-17 10:00:00	but rather eigs of the whole thing put together
2022-05-17 10:00:08	with the removed edges
2022-05-17 10:00:23	... maybe it doesn't actually change anything and I misremember where we got it from

dds

2022-05-17 10:00:47	Btw I was right about the butterflies across the diagonal and the 'sub-transforms' - it's part of the structure of the matrix. The char poly factors over Z into polys of degrees 4, 4, 3, 1, 1; so I was seeing the sub-matrices corresponding to the first three. One immediate consequence is that closed form expressions for the AFV coeffs aren't too bad e.g. `4t^3 - 32t^2 + 80t - 60` (modulo normalisation) where t is a root of `x^4 - 14x^3 + 66x^2 - 124x + 78.` Having a 'hidden algorithm' in the spec is decidedly non-good IMO. I contend that it's possible to define AFV concisely and precisely; the only decimal you need is the unknown coeff from the top corner transform, viz.: 3.0377158905213184. A benefit of doing it this way is that once readers know that the matrix came from spectral graph theory, they may be inclined to write a much cheaper implementation of AFV that exploits the structure of the matrix, instead of doing a schoolbook matrix multiply like libjxl.
	veluca ... maybe it doesn't actually change anything and I misremember where we got it from
2022-05-17 10:09:58	I think so - the graph components are disjoint so the laplacian has the two 'sub-laplacians' along a block diagonal. So you should end up with essentially a union of eigenvectors rather than anything more involved

veluca

2022-05-17 10:13:54	oh, interesting that the poly factors, I didn't expect that
2022-05-17 10:14:31	right, that makes sense
2022-05-17 10:14:49	what did you get for those 3 coeffs eigenvectors and what's actually in libjxl?
2022-05-17 10:15:08	iirc there's a [1, 1, 1], a [0, -1, 1], and then I don't remember (modulo normalization)

dds

2022-05-17 10:15:47	right - the other one is (2, -1, -1)
2022-05-17 10:15:51	*should be

veluca

2022-05-17 10:17:07

and what is it actually? 😄

dds

2022-05-17 10:20:40	it's hard to say because - as you said at the start of this - I think you've done some splicing
2022-05-17 10:21:17	specifically, the first row (the DC) isn't an eigenvector of either

veluca

2022-05-17 10:25:14	well, it is
2022-05-17 10:25:18	of both 😛

dds

2022-05-17 10:28:47

i mean it's not 'raw' - you have to add / butterfly them together

veluca

2022-05-17 10:30:29

eh, they still form an eigenbasis... a bit annoying that the eigenspaces have dimension >1

dds

2022-05-17 10:31:09	this is the matrix before normalisation; look at rows 0 and 5. row 5 is the eigenvector for eiganval 0 in the 13x13 and row 0 is the sum of both 'DC's
2022-05-17 10:32:44	in fact if it weren't for row 0, you wouldn't need to normalise at all (evecs of symmetric matrices)

veluca

2022-05-17 10:37:45

(that's just an artefact of the algorithm for finding them 😛)

dds

2022-05-17 10:38:02

my bad, row 1 i mean

veluca

2022-05-17 10:38:10	that row 1 is weird though
2022-05-17 10:38:29	it doesn't seem like an eigenvector
2022-05-17 10:40:08	that's the decoder-side matrix, right?
2022-05-17 10:40:38	AFAIU that kinda looks like a "let's make the corner pixel less precise" which I can't imagine why we would do

dds

	veluca it doesn't seem like an eigenvector
2022-05-17 10:42:35	putting it another way, since the graph matrix is symmetric but we still have to normalise the above matrix, row 1 isn't an eigenvector... right?

veluca

2022-05-17 10:42:57	best I can tell
2022-05-17 10:43:57	let me check the jxl enc and dec sources too
2022-05-17 10:46:07	wait, how did you get that matrix? the one in the spec isn't all-0 on row 1

dds

2022-05-17 10:46:26

it's pre-normalisation

veluca

2022-05-17 10:46:28	(I mean in the 13 part)
2022-05-17 10:46:42	normalization usually doesn't get you 0s though 😛

dds

2022-05-17 10:48:05	i added the 'sensible' multiple of row 0 to get a vector that only uses the corner pixels
2022-05-17 10:49:11	the matrix I posted is the same as AFV, modulo normalisation

veluca

2022-05-17 10:49:49	ah, I see
2022-05-17 10:50:40	but then it makes sense that you wouldn't get an eigenvec anymore, because 0 and 1 are not eigs for the same eigenvalue of the 3-part
2022-05-17 10:56:21	then again, the eigvecs for eigval 0 are 0 and 5, clearly, so what the heck is the original row 1 anyway?

dds

2022-05-17 10:56:25

I think that's only valid if we're happy that row 1 is an evec to begin with - which I don't believe we are

veluca

2022-05-17 10:57:03	let me do some more checking
2022-05-17 11:00:55	yeah, no, that thing is not an eigenvector, is not particularly close to being an eigenvector, and the direction in which it's not an eigenvector doesn't make a lot of sense to me

dds

2022-05-17 11:01:40

right

veluca

2022-05-17 11:03:58

... just for completeness, is it actually orthogonal?

dds

2022-05-17 11:04:28	no, the lack of orthogonality proves it isn't an eigenvector
2022-05-17 11:04:38	sorry i wasn't clear earlier

veluca

2022-05-17 11:05:47	right, thought so
2022-05-17 11:05:51	because the spec says it is
2022-05-17 11:06:06	so the spec lies

dds

2022-05-17 11:06:14	ah no sorry wait wait
2022-05-17 11:06:51	they are orthogonal in the spec (AFV as given really is an orthonormal matrix, modulo questionable floating point rounding)
2022-05-17 11:07:23	i'm talking about the presumed row 1 of the matrix i posted before normalisation

veluca

2022-05-17 11:07:43	ah, ok
2022-05-17 11:08:14	so my best guess now is that row 1 is obtained by orthonormalization of the rest of the matrix
2022-05-17 11:09:16	does that make sense?

dds

2022-05-17 11:10:43	no
2022-05-17 11:10:55	well, it is, but I don't think that explains it

veluca

2022-05-17 11:11:02

why not?

dds

2022-05-17 11:14:03

I believe we have to choose between 1) The AFV matrix has always consisted of orthogonal vectors; that is, the effect of normalisation was only to scale each row. This makes row 1 a weird choice of basis vector with uncertain provenance, or 2) the AFV matrix did not start out orthogonal, in which case row 1 probably started off as a multiple of [3.037716, 1, 0, 0, 1, ...]. This does at least relate to only the corner pixels, but it's not an eigenvector.

2022-05-17 11:15:27

My money is on 2), especially given <@532010383041363969> 's comments about "The top corner was something I and Luca haggled/hacked/kludged"

veluca

2022-05-17 11:19:28	that doesn't explain how row 5 looks like it does though
2022-05-17 11:19:40	it's too nice for that 3.smthing to have been randomly decided
2022-05-17 11:20:31	well, now that 3.x does make sense to me at least
2022-05-17 11:22:15	my money is now on "we computed the eigvecs, then tweaked them until row 1 looked a bit more (0, 0)-heavy, and then stuck with that"

dds

2022-05-17 11:24:56

What's wrong with row 5? You mean `[-0.410538 0.623549 -0.064351 -0.064351 0.623549 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351 -0.064351]` right? I claim this was `[ 0. 0. -0.27735 -0.27735 0. -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 -0.27735 ]` before normalisation which is just the evec for eval 0 in the 13x13

veluca

2022-05-17 11:25:47

sorry, row 4

Info

JPEG XL

General chat

Voice Channels

Archived

libjxl