JPEG XL

2024-01-11 07:12:04	I evaluated 30 tik tOk videos and 20 images
2024-01-11 07:23:23	Latest image
2024-01-11 07:23:46
2024-01-11 07:23:52	Original
2024-01-11 07:29:48	Send to everyone: Benchmarks studies Three resolutions
2024-01-11 07:30:31	Not in whatsapp only to compression enngineer

eddie.zato

2024-01-18 10:45:14

Generation loss is still fun <:CatSmile:805382488293244929>

_wb_

2024-01-18 11:17:48

what's the intermediate format you're using between generations? 8-bit png, 16-bit png, 32-bit pfm?

eddie.zato

2024-01-18 11:25:40

16-bit png

_wb_

2024-01-18 11:26:18

<@532010383041363969> I think generation loss is a good argument for constraining enc gaborish to be as close as possible to the inverse of dec gaborish, rather than allowing it to apply "anything that works" for a single generation and according to Butteraugli. If there are reasons to deviate from that (e.g. compensating for subsequent quantization errors) then that's fine with me, as long as the goal is still to make the roundtrip as close as possible to the identity and not introducing artifacts that improve metric scores in one generation but accumulate into something bad over several generations, i.e. the tuning could be done by optimizing for the result of not just one roundtrip but say the end result of 5 or 10 generations (yes, that makes tuning N times slower but I think it's worth it).

2024-01-18 11:29:32

Adding an option to benchmark_xl to do multiple generations of encoding would be useful. A change might fool the metrics in one generation and look good while it actually does introduce artifacts that when amplified through multi-generation encoding become problematic enough to no longer fool the metrics.

2024-01-18 11:35:00

what is the point of gaborish again

_wb_

2024-01-18 11:43:28

It's basically a way to get the advantages of a lapped transform without doing a lapped transform: encoder side it applies a kind of sharpening before doing DCT, decoder side it applies a kind of blurring after doing DCT that undoes the sharpening the encoder did and also reduces blocking and DCT artifacts.

afed

	afed very visible blocking in red (enc-dec-jpegli-bd16 compared to mozjpeg) <https://unsplash.com/photos/eLHsbiCipc8> resized from the original size to 1920x2688 jpegli jpeg q80 + png (with jpegli-bd16-dec)
2024-01-18 01:16:25	also jpegli block artifacts are still there in the latest version something similar with generation loss blockiness

_wb_

2024-01-18 01:24:14

is this XYB jpegli or YCbCr? Either way, I wonder where the blockiness is coming from, I would assume this is something that should be avoidable...

afed

2024-01-18 01:24:56

both (but less for xyb)

Jyrki Alakuijala

2024-01-18 03:11:42	we need to improve on this in cjxl and cjpegli
2024-01-18 03:13:01	we will think what to do -- one possibility is to optimize the quantization heuristics exactly for the generation loss (as a mixed objective, partially for image quality, partially for generation loss)
2024-01-18 03:13:33	but I like the approach where we first acknowledge the problem and stay open to it for awhile before deciding how to react 🙂
2024-01-18 03:13:45	do we have an open issue for this in the github?

Oleksii Matiash

	Jyrki Alakuijala we will think what to do -- one possibility is to optimize the quantization heuristics exactly for the generation loss (as a mixed objective, partially for image quality, partially for generation loss)
2024-01-18 03:53:49	I believe this (optimization for the generation loss) should be top priority given how often and how many times pictures are recompressed in the Internet

damian101

	eddie.zato Generation loss is still fun <:CatSmile:805382488293244929>
2024-01-18 05:29:12	why does cjpegli have worse generation loss than mozjpeg here?

Traneptora

2024-01-18 10:20:40

jxl is supposed to have convergent generation loss, iirc

Jyrki Alakuijala

	why does cjpegli have worse generation loss than mozjpeg here?
2024-01-19 01:36:37	most likely because of the variable dead zone quantization

_wb_

2024-02-01 04:52:17	https://www.w3.org/Graphics/Color/Workshop/slides/talk/lilley this talk shows some nice plots of how uniform various spaces are w.r.t. DeltaE 2000
2024-02-01 04:52:25
2024-02-01 04:53:01	would be interesting to see what a plot like that looks like for XYB

yoochan

2024-02-01 05:57:05	On a similar subject i stumbled upon this fancy oklch color picker https://oklch.com/#70,0.1,2,90.17
	_wb_ would be interesting to see what a plot like that looks like for XYB
2024-02-01 06:11:12	Perhaps there is some code which could be scavenged from https://bottosson.github.io/posts/oklab/ as he displays similar plots

monad

2024-02-05 05:02:29	What is the robust way to aggregate timings of a single command over different images? sum(pixels)/sum(seconds) has a downside when applied to timings below the precision of measurement, but is median(pixels)/median(seconds) better?
2024-02-05 05:04:14	Or rather median([pixels/seconds for each image])

Traneptora

	monad What is the robust way to aggregate timings of a single command over different images? sum(pixels)/sum(seconds) has a downside when applied to timings below the precision of measurement, but is median(pixels)/median(seconds) better?
2024-02-05 07:20:55	if you have the data for each one, why not just do pixels/second for each image, and then take the mean of those?
2024-02-05 07:22:50	total pixels / total seconds has the effect of making small images not count very much toward the average
2024-02-05 07:23:25	let's say I have a large image and a small iamge
2024-02-05 07:23:35	the large image takes 1 MP/s, and the small one is much faster. 10 MP/s
2024-02-05 07:23:52	if you total the pixels and divide by total time, that 10 MP/s will barely contribute to the average at all

_wb_

2024-02-05 07:43:37

then again that's probably OK, since the measurement for the small image will likely be less accurate and include more stuff that isn't asymptotically important (depending on how you run the benchmark, you might be counting things like loading the binary and initializing some stuff as part of the enc/dec time)

monad

	Traneptora if you have the data for each one, why not just do pixels/second for each image, and then take the mean of those?
2024-02-05 10:16:10	Okay, then I have to learn how to incorporate samples of zero seconds.

_wb_

2024-02-05 04:21:36

`--num_reps=100` 🙂

monad

2024-02-05 05:16:49

while elaspsed_time < time_unit: _time command_

Traneptora

2024-02-05 05:19:49	num_reps makes more sense
2024-02-05 05:20:14	if your samples are too small then most of the overhead for a single execution is going to be unrelated to JXL

monad

2024-02-05 06:57:43

But it should be fair across encoders, and anyway the overhead is part of the practical implications of invoking the command. num_reps should be more representative of using the library directly.

_wb_

2024-02-05 07:24:04

In most use cases, you would use the library directly, not call cjxl/djxl on intermediate files...

monad

2024-02-05 07:41:28

I'm looking at transcoding existing files. Maybe it's useless to most people, but that's in line with everything else I make.

Traneptora

2024-02-05 08:22:54

are you producing benchmarks to measure solely how you use it? because the library is how most users will use it

_wb_

2024-02-07 08:46:13	The WebP/AVIF team has added a new benchmark tool to their page: https://storage.googleapis.com/demos.webmproject.org/webp/cmp/index.html
2024-02-07 08:47:45	They test at all qualities from 0 to 99 so results are a bit messy, especially the results they suggest to look at
2024-02-07 08:48:51	But it's a nice visualization tool that can be used to make relevant plots too, e.g. something like https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/index.html?load=all.json#WebP+method+4-quality=90..90&matcher_ssim=off
2024-02-07 08:51:18	If you just aggregate over the entire range from q0 to q99, you end up giving a _lot_ of weight to very low quality settings that no sane person would use.
2024-02-07 08:54:22	I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).

jonnyawsom3

2024-02-07 09:05:34

So around q70 JXL seems to be at a better ratio than AVIF, with the gap significantly increasing after 80 and 90

_wb_

2024-02-07 09:14:42

Yes, well it also depends on what speed you look at, even at q30, jxl still gives better bang for buck if you want fast encoding.

2024-02-07 09:19:35

I'm also not sure if the set of images they used for testing is very representative, I should check the full corpus though (haven't found a convenient way to see all source images yet though). If you have few images with natural content like a human face, and many with hard straight diagonal lines (like modern, very 'clean' architecture), then AVIF will tend to look better since it's good at those kind of images.

HCrikki

2024-02-07 09:39:30

Would be helpful illustrating the kind of visual quality some of those numbers map to. I recall seeing in the past a low filesize output for avif that was of unacceptably low quality comparable to bit starved ancient jpeg

190n

	_wb_ I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
2024-02-07 11:10:47	"good argument, unfortunately i have depicted AVIF as the soyjak and JXL as the chad"

Traneptora

	190n "good argument, unfortunately i have depicted AVIF as the soyjak and JXL as the chad"
2024-02-08 01:44:37

spider-mario

2024-02-08 08:22:56

190n

2024-02-08 08:39:16

ofc once you get _really_ small, JXL starts winning again

yoochan

	_wb_ I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
2024-02-08 08:52:36	how do you get the curve at webp q90 ? I struggle to select it. and in all cases avif seems to win... what a shitty presentation

veluca

2024-02-08 09:10:14

that's not the most usable UI I've seen in my life, no

_wb_

	yoochan how do you get the curve at webp q90 ? I struggle to select it. and in all cases avif seems to win... what a shitty presentation
2024-02-08 09:24:08	Easiest way imo is changing the url. https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/index.html?load=all.json#WebP+method+4-quality=90..90&matcher_ssim=off

yoochan

2024-02-08 09:24:47

thanks 😄

_wb_

2024-02-08 09:28:29

the "presets" the interface suggests are mostly based on comparisons at qualities like this: https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/visualizer.html?bimg=..%2Fclic_validation_2021_2022_2024%2Fimages%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.png&btxt=original&rimg=encoded%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif&rtxt=AVIF+speed+6&limg=encoded%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e7q003.jxl.png&ltxt=JPEG+XL+effort+7

yoochan

2024-02-08 09:30:05

(your last link gives me an error)

_wb_

2024-02-08 09:30:22	that's a 10kb jxl and only a 6kb avif yet they have similar ssim and ssimulacra2 scores, so percentage-wise that image is a big win for avif
	yoochan (your last link gives me an error)
2024-02-08 09:30:57	strange, it works for me. what error?

yoochan

2024-02-08 09:31:37	'the image .... cannot be displayed because it contains errors' (under firefox)
2024-02-08 09:31:49	was it a jxl pic ?
2024-02-08 09:32:18	it might be an error of the jxl plugin

_wb_

2024-02-08 09:32:57	https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/encoded/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif
2024-02-08 09:33:05	no they show the jxl as a png
2024-02-08 09:33:16	only the avif is shown as an avif in that interface
2024-02-08 09:33:26	https://storage.googleapis.com/demos.webmproject.org/webp/cmp/clic_validation_2021_2022_2024/images/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.png is the original

yoochan

2024-02-08 09:34:38	the avif works 😄
2024-02-08 09:37:16	I'll do some benchmark_xl on this flower to understand better how a similar ssimulacra2 can be reached

_wb_

2024-02-08 09:37:53

anyway, if you compare mostly at such horrible qualities, avif is indeed good — being a video codec, it is designed to do something not too bad looking even at very low bitrates. This is not a quality anyone would want to use for a still image though.

yoochan

2024-02-08 09:40:04	(benchmark_xl with "jxl:d1.0:glacier" returns me a core dump with commit ae50ce4b)
	_wb_ anyway, if you compare mostly at such horrible qualities, avif is indeed good — being a video codec, it is designed to do something not too bad looking even at very low bitrates. This is not a quality anyone would want to use for a still image though.
2024-02-08 09:40:55	I agree, at this resolution the image is ugly

veluca

	yoochan (benchmark_xl with "jxl:d1.0:glacier" returns me a core dump with commit ae50ce4b)
2024-02-08 09:44:21	file a bug?

yoochan

	veluca file a bug?
2024-02-08 09:44:59	I'm checking if I didn't messed something but I will

veluca

2024-02-08 09:45:16

thanks 🙂

_wb_

2024-02-08 09:46:02

testing image codecs at such low qualities is like testing how well car can drive underwater — sure, if you design for it, you can make a car drive underwater, and that's quite impressive and nice, but it still doesn't mean it's a very relevant thing to test for most people's needs

yoochan

2024-02-08 11:46:16	or off-roads 😄 i'm looking at you, sellers of urban 4WD with a snorkel
2024-02-08 11:46:35	```Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs ------------------------------------------------------------------------------------------------------------------------------------------------ jxl:d1.0:lightning 393 131272 2.6707357 6.189 20.943 1.34717049 87.27117148 41.28 0.59798734 1.597066130660 3.598 0 ```
2024-02-08 11:47:02	among the results returned by benchmark_xl, is one of them a butteraugli score ?

_wb_

2024-02-08 11:59:41

max norm is butteraugli, pnorm is the 3-norm butteraugli

yoochan

2024-02-08 12:00:09	thank you 🙂
2024-02-08 12:37:10	another small question in benchmark_xl, how to set the effort of avif ? `--codec=avif:q85:???`

veluca

2024-02-08 12:38:37

s0/.../s9

yoochan

2024-02-08 12:38:54

thx

damian101

	_wb_ I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
2024-02-08 12:52:24	what metric?

_wb_

2024-02-08 12:53:24

here the comparison is at similar ssimulacra2 score

damian101

2024-02-08 12:53:30

👍

yoochan

2024-02-08 12:54:29	I'm trying a new way to plot scores which will enable real comparison at similar quality... Hope it will be more readable
2024-02-08 01:13:37	the q parameter seems to have no impact on the bpp with avif... shoundn't it represent quality in 0..100 ? in `--codec=avif:q85:s0`

veluca

2024-02-08 01:14:30	you need to have avif >= 1.0.3 for it to do anything
2024-02-08 01:14:35	and git libjxl

yoochan

2024-02-08 01:15:08

oki, I pulled libjxl with all the submodules, avif is not included ?

veluca

2024-02-08 01:15:16

(that's when they added a `--quality` flag to avifenc, AFAIU)

yoochan

2024-02-08 01:15:16

it is the one of my system ?

veluca

2024-02-08 01:15:24

yeah I am pretty sure it's the system one

yoochan

2024-02-08 01:15:41	oki, I'll try to derail it to a new version
2024-02-08 03:42:50	is there a way to tell the libjxl cmake to pick the libavif available in the LD_LIBRARY_PATH instead of the one found in /lib/ ... ?
2024-02-08 03:45:13	this trick worked for all projects I compiled up to now... that's strange
2024-02-08 03:46:50	perhaps setting the PKG_CONFIG_PATH will help 😄

veluca

	yoochan perhaps setting the PKG_CONFIG_PATH will help 😄
2024-02-08 03:49:10	yeah something like that should work
2024-02-08 03:49:26	(you also need to point it to the correct libavif at compile time)

yoochan

2024-02-08 03:50:20	make sense, but many projects link correctly only with the LD_LIBRARY_PATH, that's why I failed
2024-02-08 04:00:07	success !

_wb_

2024-02-10 03:22:31

The battle of the codecs goes on. I've got news from the Pareto front regarding lossless compression.

yoochan

2024-02-10 03:23:07

do you have a link for better quality ?

_wb_

2024-02-10 03:23:40

For smallish images (smaller than 4 Mpx) not too much changes between libjxl 0.9 and libjxl 0.10

yoochan

2024-02-10 03:25:00

0.10 is out !?

_wb_

	yoochan do you have a link for better quality ?
2024-02-10 03:25:01	I'll eventually clean up stuff and share the google sheets link, but the screenshot should be OK (if you open it in browser, not the discord previews)

yoochan

2024-02-10 03:25:26

indeed, my bad 😄 thanks for sharing

_wb_

2024-02-10 03:25:30	0.10 is not yet out but one of the main changes — streaming encoding — has been implemented
2024-02-10 03:26:05	For large images the difference will be very noticeable:

yoochan

2024-02-10 03:26:38

nice

_wb_

2024-02-10 03:28:34	basically libjxl 0.10 beats libjxl 0.9 by a big margin here. On these images, the new e7 is faster than the old e4 and also 0.5 bpp smaller on average.
2024-02-10 03:29:47	For large non-photographic images (this is some set of manga), the same thing is true:

yoochan

2024-02-10 03:29:57

what are the cyan plots very close from the purple ones ?

_wb_

2024-02-10 03:31:29	(zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
2024-02-10 03:33:09	the cyan is if you explicitly tell the encoder to use non-streaming mode — it then behaves more like the 0.9 encoder behaves (still slightly faster). But the default would be the turquoise points.
2024-02-10 03:34:46	so what you can see here for these non-photographic images is that libjxl 0.9 was beating webp but not by a big margin. Now, libjxl 0.10 is beating webp more substantially.
2024-02-10 03:35:30	(for photo, jxl was already beating webp substantially but now it beats it even harder)

yoochan

2024-02-10 03:35:58

I didn't follow the subject, is there a flag to explicitely require non-streaming mode ?

2024-02-10 04:03:31	is encode speed real time or cpu time
2024-02-10 04:04:15	i always complain about this

spider-mario

2024-02-10 04:05:46

I assume it wouldn’t mention “8 threads” on the y axis if it were cpu time

2024-02-10 04:06:56

well sometimes it scales differently

_wb_

2024-02-10 04:16:08	It's real time.
2024-02-10 04:18:40	Basically before, lossless encoding was mostly single-threaded and now it properly parallelizes. But it also became more efficient because it uses less memory and uses it more locally so the speedup is actually more than 8x
2024-02-10 04:18:57	(when running with 8 threads)

2024-02-10 04:19:23

is it possible to have a max memory option?

afed

	_wb_ (zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
2024-02-10 04:19:35	also what about single threaded mode for all codecs? i think that might also be a claim, because it's possible to encode multiple images in parallel
2024-02-10 04:21:39	and the opposite, on some even higher-core cpu, with more threads for encoders and just on very large images

lonjil

2024-02-10 04:25:30

Even though I don't think QOI and friends are particularly useful, it might be good to include them for when this is posted publicly.

spider-mario

	w well sometimes it scales differently
2024-02-10 04:31:50	right, but it would be kind of strange to have just this one graph then
2024-02-10 04:32:04	my inference wasn’t pure deduction

afed

	lonjil Even though I don't think QOI and friends are particularly useful, it might be good to include them for when this is posted publicly.
2024-02-10 04:32:27	then, instead of QOI it would be better to use fpnge or QOI just to compare that it's not a better format even compared to the old PNG (if use a different, faster encoder)

_wb_

2024-02-10 04:35:34

I used benchmark_xl, maybe we should add qoi and fpnge there...

lonjil

2024-02-10 04:36:46

It would be useful context, since a lot of people have heard about QOI and fpnge being very fast at their respective compression ratios. Also, did someone make an improved version of QOI at some point? I feel like I saw something new in that area, but I don't recall what it was called.

afed

2024-02-10 04:37:35

<https://github.com/nigeltao/qoir>

lonjil

2024-02-10 04:39:24	thanks
2024-02-10 04:40:43	While looking I came across this https://github.com/jido/seqoia

afed

2024-02-10 04:42:06

yeah, there are many forks with some changes

_wb_

2024-02-10 05:09:29	Maybe I will just manually measure some of these
2024-02-10 05:10:09	At the other end, you could also add various png optimizers (this is just default libpng)

afed

2024-02-10 05:18:15

but, optimizers can usually only optimize pngs, so it's not a very fair comparison or also it's basically the combined time spent on the first png encoding and then the time on optimization for qoi forks I don't think it's useful, those are very task specific codecs

jonnyawsom3

	_wb_ (zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
2024-02-10 05:50:30	Was the new e1 hitting 130 MP/s? May be worth having a value on the axis where the highest result is

_wb_

2024-02-10 06:21:53	I got about 380 MPx/s for e1, didn't measure accurately though (just one encode per image)
2024-02-10 06:22:34	this is on my laptop, a macbook pro M3

jonnyawsom3

2024-02-10 06:25:38

Riight, I see now. Each 'block' on the chart is an order of magnitude increase with the intermediaries at 1/10ths

Traneptora

	_wb_ Basically before, lossless encoding was mostly single-threaded and now it properly parallelizes. But it also became more efficient because it uses less memory and uses it more locally so the speedup is actually more than 8x
2024-02-10 06:54:51	isn't threadless encode speed more relevant? because batch converting can be done via processes etc.

lonjil

2024-02-10 07:02:04

Both are relevant. Especially the higher speeds are probably more relevant to "live" encoding than bulk encoding.

Traneptora

2024-02-10 07:08:17

_wb_

2024-02-10 07:12:27	Depends on the use case. For saving an image in an editor, or maybe in a camera, threaded makes sense. For batch encoding, single-threaded is indeed more relevant.
	Riight, I see now. Each 'block' on the chart is an order of magnitude increase with the intermediaries at 1/10ths
2024-02-10 07:13:43	Yeah you need a log scale for speed since otherwise anything slower than e1 just is at the bottom of the plot 🙂

afed

	_wb_ For large images the difference will be very noticeable:
2024-02-10 07:15:22	for webp it would be better to use `-z <int> ............... activates lossless preset with given level in [0:fast, ..., 9:slowest]` which enables lossless automatically and has a wider range and `-z 0` is also pretty balanced but, dont know if benchmark_xl has this

_wb_

2024-02-10 07:40:21

It only has m1-6. I guess we have webp experts in the room here. Does it make a difference to configure it with -z instead of -m? Besides having more steps of effort (but if that's the only difference, I don't really care, the range between m1 and m6 is not that huge anyway, compared to the range of efforts you have in libjxl and libaom)

username

	_wb_ It only has m1-6. I guess we have webp experts in the room here. Does it make a difference to configure it with -z instead of -m? Besides having more steps of effort (but if that's the only difference, I don't really care, the range between m1 and m6 is not that huge anyway, compared to the range of efforts you have in libjxl and libaom)
2024-02-10 08:01:20	I'm not really exactly sure though after looking around it seems like this is how mapping between the two is done. only problem is I don't really know how to read code that well but hope this helps either way!
2024-02-10 08:03:20	oh I was looking at it wrong and just realized the table/array or whatever it's called changes both `-m` and `-q` based on `-z` from a preset selection (I thought it was doing some kinda weird scaling of the values because I didn't realize it had a value set for each of the 10/9 levels because my brain counted it wrong)
2024-02-10 08:14:51	also cwebp is weird because `-q`/quality doesn't always mean visual quality since in the case of lossless it directly relates to effort. I know that `-m 6` with `-q 99` won't activate the brute force "lossless cruncher" but `-m 6` with `-q 100` will
2024-02-10 08:20:19	there's also the whole thing of `-mt`(multithreading) being off by default which makes sense if you plan to mass encode WebPs with each core working on one image however in almost all other cases it doesn't make sense, also iirc cwebp can only use up to like 2 threads and it only spins up the second thread for specific things with the most impactful thing being the lossless cruncher which outputs the same result but in almost half the time with the second thread

Orum

	username also cwebp is weird because `-q`/quality doesn't always mean visual quality since in the case of lossless it directly relates to effort. I know that `-m 6` with `-q 99` won't activate the brute force "lossless cruncher" but `-m 6` with `-q 100` will
2024-02-10 08:41:35	does `-z 9` set both `-m 6` and `-q 100` though?

username

	Orum does `-z 9` set both `-m 6` and `-q 100` though?
2024-02-10 08:48:12	yes, and the image in my first message shows what each level of `-z` hooks up to

Orum

2024-02-10 08:49:08	ohh, I see
2024-02-10 08:51:21	also `-mt` is of limited use in `cwebp`

username

2024-02-10 08:52:26

it is but It makes a big difference with `-z 9` since the lossless cruncher can be multithreaded

Orum

2024-02-10 08:52:39	yeah, but it only helps with `-z 9`
2024-02-10 08:54:32	...and even then, only 2 threads, no more

username

2024-02-10 09:18:39

some stages of "analysis" can get done on the second thread but uhhh I'm not sure exactly how much of a difference it makes in the end and I don't know enough about cwebp to know when it even happens during encoding

Orum

2024-02-10 09:49:59	honestly the most annoying part of cwebp is that you can't count on higher `-z` levels being the same size or smaller
2024-02-10 09:50:43	I have images where `-z 3` is smaller than all higher levels <:SadOrange:806131742636507177>

_wb_

2024-02-10 10:00:44

This can happen also in libjxl. It's quite hard to avoid — unless you make each higher effort also try all lower efforts.

jonnyawsom3

	username it is but It makes a big difference with `-z 9` since the lossless cruncher can be multithreaded
2024-02-10 10:03:16	Similar to e10 using 100% CPU on brute force

Orum

	_wb_ This can happen also in libjxl. It's quite hard to avoid — unless you make each higher effort also try all lower efforts.
2024-02-10 10:58:15	well I think it's mostly an issue of "how often and how bad" than anything
2024-02-10 10:59:14	if it only occurs rarely (like, 1 in 1000) and it's not bad when it does occur (< 1% savings) it's not so much of an issue
2024-02-10 11:00:05	once 0.10 comes out I'm going to be doing a lot of benchmarks though, as I've got a lot of images I'd like to move over to <:JXL:805850130203934781>

jonnyawsom3

2024-02-10 11:20:36	The best way is to find out why it does so on higher efforts, such as this case https://discord.com/channels/794206087879852103/804324493420920833/1196974617982685318
2024-02-10 11:21:15	Halved at g2 and then doubled at g3

_wb_

2024-02-11 10:46:42
2024-02-11 10:51:40	Pareto front plot for lossy, where I adjusted the quality knob for each encoder+effort until the corpus average ssimulacra2 score was as close as possible to 85
2024-02-11 10:55:39	So basically avif s1 has the same compression performance as jxl e4 but is 100 times slower, and avif s0 more or less matches jxl e6 in compression and is also 100 times slower
2024-02-11 10:58:36	At reasonable speeds (avif s6+), it's a clear win for jxl
2024-02-11 11:00:28	Note also that lossy webp is obsoleted by jpegli, which is faster than any webp effort and also better than any webp effort.
2024-02-11 11:03:42	(this is for a specific quality point though, for lower quality it may be different)

yoochan

2024-02-11 11:08:34

Thank you for the graphs! I find it not easy to find convincing and unbiased representations. Where the quality / speed / size tradeoff should be plotted in 3d 🙃

spider-mario

2024-02-11 11:34:52

is that xyb jpegli?

MSLP

2024-02-11 11:39:22

where on that charts mozjpeg would roughly be?

_wb_

	spider-mario is that xyb jpegli?
2024-02-11 12:13:04	I dunno, what does benchmark_xl do by default when you select jpeg:enc_jpegli?
	MSLP where on that charts mozjpeg would roughly be?
2024-02-11 12:14:30	I could test but for high quality it didn't improve much on libjpeg-turbo in previous experiments, just a lot slower...

Traneptora

	_wb_
2024-02-11 01:28:16	what's the x axis? bpp?

_wb_

2024-02-11 01:28:31	oops I cut the label, yes, bpp
2024-02-11 01:29:21	also I mislabeled some of the avif points
2024-02-11 01:30:32	at s0, I could get away with q76 where at s9 it needed q82 to get the same target corpus-avg ssimulacra2 score of 85
2024-02-11 01:36:37	(jxl at e3/e4 seems to produce slightly higher ssimulacra2 scores than at higher effort settings when both are set to d1, but starting at e5 it's relatively consistent)

Traneptora

2024-02-11 01:37:18

less consistent quality, makes sense tbh

_wb_

2024-02-11 01:49:48	i didn't try to measure consistency of quality _across_ images here, this is just aligned on the corpus average ssimu2 score, which actually favors inconsistent codecs like webp and avif since basically while their average score is the same as jxl's (around 85 here), they reach it by getting a score of 90+ on the easy images and a much lower score on the hard images, so they save lots of bytes on the hard images (by not delivering the desired quality)
2024-02-11 01:57:22	oops again, those s0 points are not for the exact same corpus (accidentally included some other images)
2024-02-11 01:57:51	here's the correct plot
2024-02-11 02:00:12	same info in table form:
2024-02-11 02:03:50	table also shows decode speed and the actual metric scores — the q-settings were chosen to get close to ssimulacra2=85 but there's of course still some variation (in jxl in principle I could get rid of that by tweaking the distance, in most others the q-settings have integer steps so you can't get arbitrarily close)
2024-02-11 02:12:37	this is using 8 threads on a macbook M3, and speeds were not measured accurately (just a single iteration)
2024-02-11 02:13:32	(since I'm mostly interested in orders of magnitude, measuring very accurately is not really needed imo)

spider-mario

_wb_ i didn't try to measure consistency of quality _across_ images here, this is just aligned on the corpus average ssimu2 score, which actually favors inconsistent codecs like webp and avif since basically while their average score is the same as jxl's (around 85 here), they reach it by getting a score of 90+ on the easy images and a much lower score on the hard images, so they save lots of bytes on the hard images (by not delivering the desired quality)

2024-02-11 02:15:20

so, basically, we have: - same quality setting for all images, picking the one that yields the desired ssimulacra2 score on average (favours inconsistent codecs) but in principle, we could also do one of these: - individual quality setting for each image, so that each image has the desired ssimulacra2 score (corrects for inconsistent codecs) - same quality setting for all images, but pick such that the _worst_ ssimulacra2 is the desired one (_penalises_ inconsistent codecs)

_wb_

2024-02-11 02:15:41	I don't know how the chrome team was benchmarking that avif decode speed was so great compared to jxl decode speed, but the results I'm getting seem to show something different
	spider-mario so, basically, we have: - same quality setting for all images, picking the one that yields the desired ssimulacra2 score on average (favours inconsistent codecs) but in principle, we could also do one of these: - individual quality setting for each image, so that each image has the desired ssimulacra2 score (corrects for inconsistent codecs) - same quality setting for all images, but pick such that the _worst_ ssimulacra2 is the desired one (_penalises_ inconsistent codecs)
2024-02-11 02:19:19	That's right. Setting it individually per image is probably the most fair thing to do, but it's not a very realistic/typical way in which people use codecs. Often a single setting is determined based on looking what it does to a few images, and then that setting is used for all images. So the third thing (aligning on worst case, or let's say on p10), which penalizes inconsistent codecs, is actually the most relevant thing imo, but here I'll do the first thing (aligning on average, which favors inconsistent codecs) just to show that even when trying to favor avif, it still doesn't look good 🙂
2024-02-11 02:22:27	I'll try to make such plots also for some "medium quality" point (averaging at ssimulacra2=70) and one for "camera quality" (something like d0.5).

yoochan

	_wb_ same info in table form:
2024-02-12 08:35:39	can you select MT or not for avif from benchmark_xl ?

_wb_

2024-02-12 09:05:10	yes, add something like `:log2_cols=2:log2_rows=2` to make it MT
2024-02-12 09:06:15	avif is a bit funky: you have to manually specify how it does tiling, and you need to encode MT in order to be able to decode MT
2024-02-12 09:08:13	I'm assuming these two parameters set like that mean that it splits the image in tiles such that there are 2^log2_cols tiles horizontally and 2^log2_rows tiles vertically, so in 16 parts which should be enough when doing MT with 8 threads

veluca

2024-02-12 09:08:23

and it makes quality/density worse of course

yoochan

	_wb_ yes, add something like `:log2_cols=2:log2_rows=2` to make it MT
2024-02-12 09:09:15	Thanks 😅 i couldn't have guessed

_wb_

2024-02-12 09:28:27
2024-02-12 09:31:06	I do kind of like the calibration of this new avif quality scale, avif q50 (at e0) is actually a quality that I would call "medium quality", and it corresponds to jxl d2.6, webp q76, libjpeg-turbo q70

spider-mario

2024-02-12 09:37:28

oh, so if one is fine with needing about a minute instead of 10 seconds for a Facebook-size medium-quality image, AVIF is actually kind of competitive

_wb_

2024-02-12 09:39:04	yes, at the lower qualities avif is competitive in compression, if you have the cpu time to throw at it
2024-02-12 09:40:25	I dunno about others but for Cloudinary anything slower than default effort avif (s6) is too slow — the jump from s6 to s5 is quite steep, too

yoochan

	_wb_ yes, at the lower qualities avif is competitive in compression, if you have the cpu time to throw at it
2024-02-12 10:02:17	At lower qualities, avif outperform jxl for ssimulacra2 scores. But does it in subjective tests too? For ssimulacra2 scores in 60-70

_wb_

	yoochan At lower qualities, avif outperform jxl for ssimulacra2 scores. But does it in subjective tests too? For ssimulacra2 scores in 60-70
2024-02-12 10:04:01	if you're prepared to go to the extremely slow speeds, yes (we don't have a ton of data about these very slow speed settings since they're unusable to us, but the limited data we have does show that they are also subjectively indeed better)

yoochan

2024-02-12 10:05:50

But why! 😭 avif also uses some kind of dct based compression

_wb_

2024-02-12 10:07:59

basically the situation is like this: - if you are fine with using a LOT of encode cpu time, **and** you want relatively low quality, **and** you don't need progressive decode, then AVIF can save you a few percent compared to JPEG XL - in all other cases, JPEG XL matches or beats AVIF (by a big margin if you want reasonable speed **or** better than mediocre quality)

damian101

2024-02-12 07:10:24

Here is how I encode AVIF for highest efficiency at very high qualit (at the expense of threading due to row-mt 0): `avifenc -d 10 -a tune=ssim -a quant-b-adapt=1 -a enable-chroma-deltaq=1 -a deltaq-mode=2 -j 8 -a row-mt=0 -s 0 -a cq-level=14 --cicp 1/2/1`

yoochan

2024-02-12 07:17:01

What are the ssimulacra2 scores and the compression ratios for this?

damian101

2024-02-12 07:36:37

No idea... Deltaq-mode 2 is actually usually slightly disliked by ssimulacra2, btw, unless it isn't, overall it definitely increases consistency, almost as well as deltaq-mode 3, but without making things blurry (which it definitely does together with tune ssim). But quality consistency is the big issue with AVIF, and the one thing where JXL is always straight up superior. When doing automatic conversion to AVIF, you ideally want to do target quality encoding, performing multiple encodes and measuring the results. But when just comparing individual images, this holds up very well with JXL even at very high quality, but depends a lot on the specific content of course.

_wb_

2024-02-12 07:43:28

How long does it take to encode a 12 Mpx image with that setting? 😛

damian101

2024-02-12 07:44:01

well, time to find out I guess

Traneptora

2024-02-12 07:46:59	Why tune ssim?
2024-02-12 07:47:22	Also why cicp 1/2/1?
2024-02-12 07:48:06	That's bt709 matrix, unspecified primaries, bt709 trc

damian101

	Traneptora Why tune ssim?
2024-02-12 07:54:12	Always better.
	Traneptora Also why cicp 1/2/1?
2024-02-12 07:55:08	By default, avifenc uses bt.601 color matrix, which is dumb for content in sRGB/BT.709 gamut, as the BT.709 will just perform better.
	Traneptora That's bt709 matrix, unspecified primaries, bt709 trc
2024-02-12 07:55:24	no
2024-02-12 07:55:51	that's bt.709 primaries, unspecified trasnfer, bt.709 matrix

Quackdoc

	Traneptora Also why cicp 1/2/1?
2024-02-12 08:04:18	while with ffmpeg cicp is MC/CP/TC, with libavif it's CP/TC/MC
2024-02-12 08:06:39	because for people who work on image and video stuff, agreeing with other projects would probably be fatal, better to just die I guess

damian101

2024-02-12 08:07:37	Usually the order that's actually used during encoding is chosen. Or sometimes decoding.
2024-02-12 08:08:09	And unless tonemapping is done, gamut reduction happens first, color matrix last during encoding.

_wb_

2024-02-12 08:08:45

Does avifenc convert arbitrary input to that space or is the only conversion it does rgb2yuv?

spider-mario

2024-02-12 08:08:53	I thought it was the latter
2024-02-12 08:09:03	i.e. no gamut reduction, just information about what the pixel data mean

_wb_

2024-02-12 08:09:39

So if you give it P3 input you are reinterpreting it as sRGB?

spider-mario

2024-02-12 08:09:46

that’s my understanding of it

damian101

	_wb_ Does avifenc convert arbitrary input to that space or is the only conversion it does rgb2yuv?
2024-02-12 08:09:48	only the specified matrix influences encoding, the rest is just specification of color metadata

_wb_

2024-02-12 08:10:14

Metadata that can be wrong, if the input is not using those primaries...

spider-mario

2024-02-12 08:10:18

although if the input has an ICC profile, it will _also_ include it unless you specify `--ignore-icc`

_wb_

2024-02-12 08:10:24

spider-mario

2024-02-12 08:10:52	I’m not sure they specify which one takes precedence if both are present
2024-02-12 08:11:02	but to be on the safe side, whenever I specify --cicp, I also pass --ignore-icc

damian101

	_wb_ Metadata that can be wrong, if the input is not using those primaries...
2024-02-12 08:11:17	Yes, that's why it normally shouldn't be specified. Maybe I should make a feature request to ask if they can use an optimal color matrix by default for common color gamuts...

_wb_

2024-02-12 08:11:23

It's quite annoying how video folks basically just say "color is your problem, we just encode unknown sample data here"

spider-mario

2024-02-12 08:11:43

there’s this joke that “NTSC” stands for “never the same color”

damian101

2024-02-12 08:12:08	well, avifenc does properly recognize and apply color metadata
2024-02-12 08:12:40	just always encoded in bt.601 color matrix by default

_wb_

2024-02-12 08:18:36

How do you indicate full range vs tv range?

afed

2024-02-12 08:19:24

`-r,--range RANGE : YUV range [limited or l, full or f]. (JPEG/PNG only, default: full; For y4m or stdin, range is retained)` <:Thonk:805904896879493180>

_wb_

2024-02-12 08:20:15

Ah ok it already defaults to full, that's good

damian101

2024-02-12 08:42:11	since when does cjxl not thread at effort 9 by default
2024-02-12 08:43:45	Encoding a 60MP image at effort 9 takes 200 seconds on a Ryzen 7800X3D...

190n

2024-02-12 08:45:36

how much ram

damian101

2024-02-12 08:45:44	effort 7 takes 4 seconds <:Thinkies:987903667388710962>
	190n how much ram
2024-02-12 08:45:48	64GB

190n

2024-02-12 08:46:10

that's how much it uses or how much you have?

damian101

	190n that's how much it uses or how much you have?
2024-02-12 08:46:23	how much I have
2024-02-12 08:46:28	it uses around 16GB or so
2024-02-12 08:46:46	effort 8 is single-threaded, too <:Thinkies:987903667388710962>

afed

2024-02-12 08:46:48

for lossy? there is no real reason to use e9 for lossy

damian101

	afed for lossy? there is no real reason to use e9 for lossy
2024-02-12 08:47:16	let me have fun 😠
2024-02-12 08:47:48	effort 7: 4s effort 8: 46s effort 9: 200s
2024-02-12 08:48:12	and at least effort 8 and 9 don't thread on this 60MP PNG image!!

afed

2024-02-12 08:52:26

5-6 is also typically not worse than 7 for the same size, but may not be as accurately hold a given butteraugli quality

damian101

2024-02-12 08:52:41

effort 7 is already absurdly fast

afed

2024-02-12 08:53:27

and e9 can be worse subjectively, though maybe that's already been fixed

damian101

2024-02-12 08:54:07	e8 and e9 target a signficantly different quality, I guess those are more consistent in quality?
2024-02-12 08:54:27	anyway, what's going in with the threading, this looks like a serious bug to me

_wb_

2024-02-12 08:56:54	Should be better in current git version
2024-02-12 08:57:25	But I don't think e8/e9 are that useful for lossy

damian101

	_wb_ Should be better in current git version
2024-02-12 08:57:56	well, it's not like cjxl threads badly, it doesn't thread at all, exactly one thread the whole time...

afed

2024-02-12 08:58:09

not a bug, just some greedy methods are difficult to parallelize

damian101

2024-02-12 08:58:18	but it says it will use 16 threads, and it's a 16 thread machine
	afed not a bug, just some greedy methods are difficult to parallelize
2024-02-12 08:58:38	I am quite sure that effort 9 used to use more than 1 thread...

_wb_

2024-02-12 08:59:01

e8/e9 try too hard reaching a given butteraugli score imo. Optimizing too much for a metric is risky, no metric is perfect...

damian101

2024-02-12 08:59:43

Currently I'm easily beating cjxl effort 8 distance 1 with avifenc effort 3...

_wb_

2024-02-12 08:59:52

For me current git e9 does parallelize

damian101

2024-02-12 09:00:09

I'll send the source...

_wb_

	_wb_
2024-02-12 09:00:45	I got almost 1 Mpx/s at lossy e9 with 8 threads
2024-02-12 09:00:53	Aaah wait
2024-02-12 09:01:02	You may have an image with lots of patches
2024-02-12 09:01:31	Patches can mess up speed by a lot

damian101

2024-02-12 09:05:02	what are patches...
2024-02-12 09:06:29

_wb_

2024-02-12 09:13:18

If that sky is solid white in some regions, patch heuristics will be wasting nonparallelized time on it

damian101

2024-02-12 09:13:19

efforts 8/9 do thread a little, actually, but only a fraction of the time...

_wb_

2024-02-12 09:13:36

Try with --patches 0 to see if that helps

damian101

	_wb_ If that sky is solid white in some regions, patch heuristics will be wasting nonparallelized time on it
2024-02-12 09:13:43	I don't think it is, there is some visible noise throughout

_wb_

2024-02-12 09:14:09

Can't see it on my phone on that discord preview 🙂

damian101

2024-02-12 09:14:41	well, I need to zoom in on my 4K monitor to see it, haha
	_wb_ Try with --patches 0 to see if that helps
2024-02-12 09:16:51	definitely not signficantly
2024-02-12 09:17:24	btw, I just updated jxl, and now memory consumption is a lot lower <:Thinkies:987903667388710962>
2024-02-12 09:27:07	effort 7 threads decently well
2024-02-12 09:27:27	and effort 8 and 9 probably too, for the part that effort 7 does as well

afed

2024-02-12 09:50:05

if streaming mode for lossy in current git is still disabled for e7 and slower, then for e6 memory consumption will be even lower and most likely better multithreading

yoochan

2024-02-13 08:25:04

how would you get the version of avif used by benchmark_xl (the one linked to ?), ldd gives me access to a path, but I don't know the best way to extract a version id from this

_wb_

2024-02-13 08:47:48

`avifenc --version`, assuming your avifenc is linking to the same libavif

yoochan

2024-02-13 08:48:51	pas bête
	_wb_ https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/encoded/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif
2024-02-13 10:07:12	I got a feedback from the author : https://github.com/webmproject/codec-compare/issues/3

_wb_

2024-02-13 10:36:07

"Not super crisp, but can be displayed as a background for example." — if you really want a blurry mess to use as a background, just apply some gaussian blur (or some bilateral filter if you like to preserve edges) before encoding the image, and you'll see it encodes way better in any codec. Using image encoders to do the blurring for you is kind of silly.

yoochan

2024-02-13 10:38:14

agreed 😄 or resize the original picture if you don't plan to look at it

jonnyawsom3

2024-02-13 10:39:51

Or use JXL art ;P

_wb_

2024-02-13 10:40:53

Though maybe this is something we should just start doing at ridiculous distances (say, d > 5) — first apply some heavy EPF and then encode (as opposed to encoding the input as-is and only doing EPF decode-side). Should produce "better looking" images even if it's obviously a bad idea in terms of fidelity.

jonnyawsom3

2024-02-13 10:42:11

Reminded me of my thought regarding low quality jpeg inputs. If EPF could be retroactively applied, but seemed out of scope at the time

yoochan

2024-02-13 10:42:27

you want a "cheat mode" optimized only for benchmark ? like WV ?

jonnyawsom3

2024-02-13 10:43:28

WV sounds like WB's evil twin, who sacrifices preservation to win at all costs

yoochan

2024-02-13 10:43:47

could be interesting though, activable only with a flag, you get agressive noise filtering (like bilateral) before encoding 😄

_wb_

2024-02-13 10:45:53

I consider d > 4 basically settings that are not relevant for real usage but only for benchmark/testing scenarios where people want to see what happens if you totally overcompress an image (and they somehow think this is useful information and says something about what happens at more reasonable qualities). So I don't really care if we do something funky at such distances.

jonnyawsom3

2024-02-13 10:49:31	If I recall the largest 256x256 VarDCT blocks aren't currently used yet either, although I don't know if they would have much of an effect
2024-02-13 10:53:17	(or rather 128x256 if I recall)

yoochan

	_wb_ I consider d > 4 basically settings that are not relevant for real usage but only for benchmark/testing scenarios where people want to see what happens if you totally overcompress an image (and they somehow think this is useful information and says something about what happens at more reasonable qualities). So I don't really care if we do something funky at such distances.
2024-02-13 10:54:03	at which distance jxl and avif crosses ssimulacra scores ?

username

	If I recall the largest 256x256 VarDCT blocks aren't currently used yet either, although I don't know if they would have much of an effect
2024-02-13 10:55:54	seems like anything above 64x64 isn't used by libjxl's encoder currently https://docs.google.com/presentation/d/1LlmUR0Uoh4dgT3DjanLjhlXrk_5W2nJBDqDAMbhe8v8/edit#slide=id.gad3f818ca8_0_20

jonnyawsom3

2024-02-13 11:01:08

Even more room for improvement then

veluca

2024-02-13 11:31:39	> I noticed that your second plot only has 2521 comparisons, which may signal low confidence in the results due to the lack of data points. that's... an interesting statement
2024-02-13 11:32:16	> I tried to minimize any bias in the default settings: > - The full input quality setting ranges are used.
2024-02-13 11:32:20	so is that
2024-02-13 11:33:52	> > I guess < 0.1bpp can not seriously be used for display > > From the <0.1bpp plot: > Top left point: bpp < 0.03 > Right-most point: bpp < 0.03 > Not super crisp, but can be displayed as a background for example.
	veluca > > I guess < 0.1bpp can not seriously be used for display > > From the <0.1bpp plot: > Top left point: bpp < 0.03 > Right-most point: bpp < 0.03 > Not super crisp, but can be displayed as a background for example.
2024-02-13 11:34:02	...

jonnyawsom3

	veluca > I tried to minimize any bias in the default settings: > - The full input quality setting ranges are used.
2024-02-13 12:28:15	So, we just cap the maximum distance in libjxl and problem solved, we win all benchmarks, horray!

yoochan

2024-02-13 12:30:15

or, as suggested wb, we enable agressive epf after d 4.0 in order to pump up the SSIM score (at the expense of fidelity)

_wb_

2024-02-13 12:35:03	haha just capping distance at d4 would be fun — it would mean there are just no data points to show for the very low qualities 🙂
2024-02-13 12:37:13	btw avif and especially webp are benefitting from the other side of that phenomenon: for some images they just cannot reach a certain score so the comparison tool has no match to show for that image, and it doesn't count — even if I would consider it a quite big win for jxl when this happens 🙂

username

2024-02-13 12:38:23

if funky stuff is changed/done to libjxl at higher distances to improve SSIM then it would probably be a good idea to comment the new behavior/code with something like "// Improve SSIM" or something so it's easier to find in the future.

yoochan

2024-02-13 12:39:07

and put it behind a flag in the command line (or a flag at compile time too :D)

username

	yoochan and put it behind a flag in the command line (or a flag at compile time too :D)
2024-02-13 12:40:18	as in enabled or disabled by default?
2024-02-13 12:40:26	that reminds me

yoochan

2024-02-13 12:41:07

like : `cjxl --ssim-cheat -d 5.0 tutu.png` and also `cmake -DENABLE_SSIM_CHEAT`

username

	username that reminds me
2024-02-13 12:44:01	(took me a second to find this) reminds me of this toggle that exists in Nvida's GPU video encoder https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvenc-video-encoder-api-prog-guide/index.html#spatial-aq
	username (took me a second to find this) reminds me of this toggle that exists in Nvida's GPU video encoder https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvenc-video-encoder-api-prog-guide/index.html#spatial-aq
2024-02-13 12:44:57	"``` Although spatial AQ improves the perceptible visual quality of the encoded video, the required bit redistribution results in PSNR drop in most of the cases. Therefore, during PSNR-based evaluation, this feature should be turned off. ```"

monad

2024-02-13 01:31:10

libjxl uses butteraugli for internal tuning, not ssimulacra 2.1

yoochan

2024-02-13 01:32:19

that's what I'm writing, I'll invite him here, easier to discuss

username

	yoochan that's what I'm writing, I'll invite him here, easier to discuss
2024-02-13 02:11:34	maybe change it a bit so they aren't required to find the link on another website, A rewording something like this might be better? " "( link can be found here: [jpegxl.info](https://jpegxl.info/), or here's a direct link if you prefer: https://discord.gg/DqkQgDRTFu ) "
2024-02-13 02:12:16	oh it seems like they have seen your comment now

yoochan

	username oh it seems like they have seen your comment now
2024-02-13 02:15:25	that's what i did first but I didn't wanted the raw discord link to be published outside official pages... (this is a not very convincing argument, I'm not convinced myself :D)

username

2024-02-13 02:26:06	the link possibly getting picked up on github by bots or something I guess could be a worry. The thing is iirc the link has been posted into github comment sections before.
2024-02-13 02:27:55	github does let you edit comments but they have already seen it so who knows if they will look at the comment again and be convinced to join for discussion 🤷

_wb_

2024-02-13 04:43:11	Interesting. For this set of not-so-large images (1 Mpx each), avif beats heic:
2024-02-13 04:44:12	While for this set of larger images (11 Mpx on average), heic beats avif:

damian101

2024-02-13 04:49:22

how...

afed

2024-02-13 04:53:50

if its x265 for heic it's not surprising, for higher qualities x265 is still better than any av1 encoder and images usually require much higher quality than video

_wb_

2024-02-13 04:59:29	this is libheif which indeed uses x265 for heic
2024-02-13 05:02:15	and yes this is at ssimulacra around 85 (on average) which is a rather high quality, around d1

damian101

2024-02-13 05:04:39

maybe the nature of the large and small images is just different?

_wb_

2024-02-13 05:52:57	Sure, the large ones have more camera noise and less entropy per pixel, the small ones are downscales from high res photos so they have less noise but more entropy per pixel.
2024-02-13 05:58:58	Also I guess that Daala set was heavily used while designing Daala and av1, so maybe avif is performing a bit better than expected on that corpus 🙂

yoochan

2024-02-13 06:00:12	The famous "i didn't test it on something else" bias
	_wb_ Sure, the large ones have more camera noise and less entropy per pixel, the small ones are downscales from high res photos so they have less noise but more entropy per pixel.
2024-02-13 06:04:05	Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?

damian101

2024-02-13 06:09:16

it's complicated...

_wb_

2024-02-13 06:09:25

Agreed

damian101

	yoochan Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?
2024-02-13 06:13:27	You usually do not want to encode at resolutions higher than target resolution, because detail will be preserved and never displayed.
2024-02-13 06:14:05	wasting a lot of bitrate
2024-02-13 06:14:10	or file size, here

spider-mario

	yoochan Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?
2024-02-13 06:14:27	that tends to depend on the bitrate you target

damian101

2024-02-13 06:14:35	and the format
2024-02-13 06:14:54	and the downscaling method

_wb_

2024-02-13 06:15:50

Generally I wouldn't downscale as long as you want to remain in the "reasonable quality" range. For web delivery of course you should downscale to fit the layout of the page (taking into account that css pixels might be more than 1 pixel, though)

yoochan

2024-02-15 08:54:35	😄 if distance is 0.01, 0.02 or 0.03 cjxl gives an image bigger than the original png. And a negative correlation between the effort and the compression
2024-02-15 08:54:46	is the distance rounded internally ? to how many decimals ?

_wb_

2024-02-15 09:04:17

Distances below 0.05 probably don't make sense on 8-bit input, and probably a lot of encoder heuristics are not well-tuned for distances below 0.3 or so

yoochan

2024-02-15 09:09:29

thanks, I'll take 0.3 as a lower limit

_wb_

2024-02-15 09:19:10

0.1 should still be ok, but I wouldn't go lower

jonnyawsom3

2024-02-15 12:48:23

I only just thought, does streaming mode have any effect on e10? (e11 now) Had the old idea of running it on a small image, saving the parameters it finds and then applying to other/larger similar images. Although that would require PRs for the parameter saving

MSLP

2024-02-15 04:18:06

I too like the idea of for example "cjxl -v -e 10" printing the selected best parameters

_wb_

2024-02-15 04:30:44

That would be nice but I guess we'll need some API for verbosity of libjxl — currently libjxl doesn't print anything to stdout except when compile-time verbosity is set to something nonzero and then it just always prints stuff. Basically we don't want to ship (all) debug output strings in release builds, but we could still do something like having a runtime verbosity in addition to a compile-time verbosity, or something.

2024-02-15 04:32:52

The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.

afed

2024-02-15 04:33:57

maybe at least in benchmark_xl

Cacodemon345

	_wb_ The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.
2024-02-16 06:40:29	Setting a custom logger function would be better. Application remains in full control.

yoochan

	_wb_ The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.
2024-02-16 07:33:47	Or a dedicated structure

monad

	I only just thought, does streaming mode have any effect on e10? (e11 now) Had the old idea of running it on a small image, saving the parameters it finds and then applying to other/larger similar images. Although that would require PRs for the parameter saving
2024-02-16 04:41:06	Most combinations of settings are useless, meaning e11 does a lot of work for nothing. It can also find obscure configurations (particularly specific predictors) which work for a single image, but don't generalize. Larger images prefer g3, smaller images less so. Patches are generally bad.

fab


2024-02-19 02:51:36	this is not the type of look i did pioirirrty on my phone

VcSaJen

2024-02-21 12:36:38

I wonder about how things have evolved over the years 2021-2024. Is AVIF getting closer to JPEG XL? Or JPEG XL further increased distance?

jonnyawsom3

2024-02-21 09:41:37	Did a few quick tests on an 8K screenshot with/without streaming mode on a Ryzen 7 1700 Starting with e1, performance was the same between 0.9.1 and 0.10.0 ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 34592.2 kB (8.341 bpp). 7680 x 4320, 103.346 MP/s [103.35, 103.35], 1 reps, 16 threads. PeakWorkingSetSize: 324.6 MiB PeakPagefileUsage: 416.6 MiB Wall time: 0 days, 00:00:00.423 (0.42 seconds) User time: 0 days, 00:00:00.109 (0.11 seconds) Kernel time: 0 days, 00:00:02.093 (2.09 seconds) ``` But Streaming input practically halved memory usage ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 1] Compressed to 35417.2 kB (8.540 bpp). 7680 x 4320, 102.867 MP/s [102.87, 102.87], 1 reps, 16 threads. PeakWorkingSetSize: 230.4 MiB PeakPagefileUsage: 227.3 MiB Wall time: 0 days, 00:00:00.349 (0.35 seconds) User time: 0 days, 00:00:00.156 (0.16 seconds) Kernel time: 0 days, 00:00:02.000 (2.00 seconds) ```
2024-02-21 09:44:24	Moving on to e3 ``` JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26824.0 kB (6.468 bpp). 7680 x 4320, 15.367 MP/s [15.37, 15.37], 1 reps, 16 threads. PeakWorkingSetSize: 2.173 GiB PeakPagefileUsage: 2.529 GiB Wall time: 0 days, 00:00:02.258 (2.26 seconds) User time: 0 days, 00:00:03.234 (3.23 seconds) Kernel time: 0 days, 00:00:12.031 (12.03 seconds) ``` Now 0.10.0 makes a difference without Streaming input ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26714.8 kB (6.442 bpp). 7680 x 4320, 20.714 MP/s [20.71, 20.71], 1 reps, 16 threads. PeakWorkingSetSize: 469 MiB PeakPagefileUsage: 525.6 MiB Wall time: 0 days, 00:00:01.704 (1.70 seconds) User time: 0 days, 00:00:04.906 (4.91 seconds) Kernel time: 0 days, 00:00:13.250 (13.25 seconds) ``` Streaming input makes roughly the same dent in memory usage as on e1 ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26714.8 kB (6.442 bpp). 7680 x 4320, 20.687 MP/s [20.69, 20.69], 1 reps, 16 threads. PeakWorkingSetSize: 373.9 MiB PeakPagefileUsage: 333.9 MiB Wall time: 0 days, 00:00:01.626 (1.63 seconds) User time: 0 days, 00:00:04.687 (4.69 seconds) Kernel time: 0 days, 00:00:12.406 (12.41 seconds) ```
2024-02-21 09:47:27	And now for e7, AKA default lossless ``` JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 22487.1 kB (5.422 bpp). 7680 x 4320, 0.277 MP/s [0.28, 0.28], 1 reps, 16 threads. PeakWorkingSetSize: 2.126 GiB PeakPagefileUsage: 2.475 GiB Wall time: 0 days, 00:01:59.703 (119.70 seconds) User time: 0 days, 00:00:40.312 (40.31 seconds) Kernel time: 0 days, 00:02:00.109 (120.11 seconds) ``` Much larger impact ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 22456.5 kB (5.415 bpp). 7680 x 4320, 4.288 MP/s [4.29, 4.29], 1 reps, 16 threads. PeakWorkingSetSize: 470.8 MiB PeakPagefileUsage: 519.6 MiB Wall time: 0 days, 00:00:07.837 (7.84 seconds) User time: 0 days, 00:00:04.375 (4.38 seconds) Kernel time: 0 days, 00:01:34.843 (94.84 seconds) ``` Once again, Streamed input is a fixed reduction in memory ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 7] Compressed to 22456.5 kB (5.415 bpp). 7680 x 4320, 4.217 MP/s [4.22, 4.22], 1 reps, 16 threads. PeakWorkingSetSize: 377.7 MiB PeakPagefileUsage: 337.5 MiB Wall time: 0 days, 00:00:07.890 (7.89 seconds) User time: 0 days, 00:00:04.843 (4.84 seconds) Kernel time: 0 days, 00:01:34.171 (94.17 seconds) ```

veluca

2024-02-21 09:59:44	I'm a bit surprised that bitrate went down... not complaining ofc
2024-02-21 10:00:04	also surprised that you have a 10x reduction in user time

jonnyawsom3

2024-02-21 10:10:49	Running e9 at the moment, naturally taking half my lifetime on 0.9.1
2024-02-21 10:49:58	40 minutes on 0.9.1 for e9 (Almost 2500 seconds exactly) ``` JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [Modular, lossless, effort: 9] Compressed to 21333.5 kB (5.144 bpp). 7680 x 4320, 0.013 MP/s [0.01, 0.01], 1 reps, 16 threads. PeakWorkingSetSize: 4.069 GiB PeakPagefileUsage: 4.114 GiB Wall time: 0 days, 00:41:39.986 (2499.99 seconds) User time: 0 days, 00:07:54.734 (474.73 seconds) Kernel time: 0 days, 00:35:54.125 (2154.12 seconds) ``` Naturally a massive improvement for 0.10.0 ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 9] Compressed to 21367.2 kB (5.152 bpp). 7680 x 4320, 0.209 MP/s [0.21, 0.21], 1 reps, 16 threads. PeakWorkingSetSize: 551 MiB PeakPagefileUsage: 602.7 MiB Wall time: 0 days, 00:02:39.119 (159.12 seconds) User time: 0 days, 00:00:08.218 (8.22 seconds) Kernel time: 0 days, 00:33:38.203 (2018.20 seconds) ``` And Streaming input keeping it under half a GB of memory ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 9] Compressed to 21367.2 kB (5.152 bpp). 7680 x 4320, 0.206 MP/s [0.21, 0.21], 1 reps, 16 threads. PeakWorkingSetSize: 461.5 MiB PeakPagefileUsage: 411.7 MiB Wall time: 0 days, 00:02:40.763 (160.76 seconds) User time: 0 days, 00:00:07.656 (7.66 seconds) Kernel time: 0 days, 00:33:43.906 (2023.91 seconds) ```
	veluca I'm a bit surprised that bitrate went down... not complaining ofc
2024-02-21 10:50:41	Interestingly bitrate went back up on e9

veluca

2024-02-21 10:51:43	there's something very broken in the threading implementation on Windows, btw - those Wall/User times don't make any sense
	Interestingly bitrate went back up on e9
2024-02-21 10:52:05	that I am less surprised about

jonnyawsom3

	veluca there's something very broken in the threading implementation on Windows, btw - those Wall/User times don't make any sense
2024-02-21 10:59:36	I got those times using this, so I wouldn't be surprised if they're inaccurate compared to native Linux https://github.com/cbielow/wintime
2024-02-21 11:07:41	Here's the start and end times of 0.9 vs 0.10, removed from those results due to Discord's character limit e1 ``` 0.9 Creation time 2024/02/21 09:12:46.207 Exit time 2024/02/21 09:12:46.831 0.10 Creation time 2024/02/21 09:29:14.263 Exit time 2024/02/21 09:29:14.687``` e3 ``` 0.9 Creation time 2024/02/21 09:14:50.419 Exit time 2024/02/21 09:14:52.677 0.10 Creation time 2024/02/21 09:15:02.020 Exit time 2024/02/21 09:15:03.724 ``` e7 ``` 0.9 Creation time 2024/02/21 09:15:56.284 Exit time 2024/02/21 09:17:55.987 0.10 Creation time 2024/02/21 09:18:16.169 Exit time 2024/02/21 09:18:24.007 ``` e9 ``` 0.9 Creation time 2024/02/21 09:48:07.953 Exit time 2024/02/21 10:29:47.940 0.10 Creation time 2024/02/21 10:35:39.433 Exit time 2024/02/21 10:38:18.553 ```
2024-02-21 11:48:25	And a quick VarDCT test for good measure ``` JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 4613.3 kB (1.112 bpp). 7680 x 4320, 1.325 MP/s [1.33, 1.33], 1 reps, 16 threads. PageFaultCount: 1153320 PeakWorkingSetSize: 1.631 GiB QuotaPeakPagedPoolUsage: 35.53 KiB QuotaPeakNonPagedPoolUsage: 80.48 KiB PeakPagefileUsage: 2.319 GiB Creation time 2024/02/21 11:45:15.585 Exit time 2024/02/21 11:45:40.726 Wall time: 0 days, 00:00:25.141 (25.14 seconds) User time: 0 days, 00:01:05.703 (65.70 seconds) Kernel time: 0 days, 00:01:43.859 (103.86 seconds) ``` Slightly slower yet more efficent ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 4273.9 kB (1.031 bpp). 7680 x 4320, 1.193 MP/s [1.19, 1.19], 1 reps, 16 threads. PageFaultCount: 949312 PeakWorkingSetSize: 463.8 MiB QuotaPeakPagedPoolUsage: 35.23 KiB QuotaPeakNonPagedPoolUsage: 20.45 KiB PeakPagefileUsage: 569.3 MiB Creation time 2024/02/21 11:45:56.165 Exit time 2024/02/21 11:46:24.065 Wall time: 0 days, 00:00:27.900 (27.90 seconds) User time: 0 days, 00:01:19.593 (79.59 seconds) Kernel time: 0 days, 00:01:53.531 (113.53 seconds) ``` And Streaming input once again keeping it under half a GB ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 4273.9 kB (1.031 bpp). 7680 x 4320, 1.160 MP/s [1.16, 1.16], 1 reps, 16 threads. PageFaultCount: 1054309 PeakWorkingSetSize: 368.1 MiB QuotaPeakPagedPoolUsage: 225.1 KiB QuotaPeakNonPagedPoolUsage: 20.31 KiB PeakPagefileUsage: 378 MiB Creation time 2024/02/21 11:51:11.688 Exit time 2024/02/21 11:51:40.306 Wall time: 0 days, 00:00:28.618 (28.62 seconds) User time: 0 days, 00:01:16.687 (76.69 seconds) Kernel time: 0 days, 00:01:53.703 (113.70 seconds) ```

veluca

2024-02-21 12:01:25	I had already observed pretty bad scaling thread scaling from windows builds yesterday
2024-02-21 12:01:47	so this is not surprising

jonnyawsom3

2024-02-21 12:01:49	e9 lossy hit 8GB of RAM for around 20 seconds before I killed the process, streaming input resulted in this ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 9] Compressed to 4272.0 kB (1.030 bpp). 7680 x 4320, 1.125 MP/s [1.13, 1.13], 1 reps, 16 threads. PageFaultCount: 996673 PeakWorkingSetSize: 432.4 MiB QuotaPeakPagedPoolUsage: 225.1 KiB QuotaPeakNonPagedPoolUsage: 20.71 KiB PeakPagefileUsage: 436.3 MiB Creation time 2024/02/21 11:59:36.841 Exit time 2024/02/21 12:00:06.357 Wall time: 0 days, 00:00:29.515 (29.52 seconds) User time: 0 days, 00:01:19.984 (79.98 seconds) Kernel time: 0 days, 00:01:57.781 (117.78 seconds) ```
	veluca I had already observed pretty bad scaling thread scaling from windows builds yesterday
2024-02-21 12:02:31	In all lossless cases it actually scaled up to 100% CPU usage, although I assume you mean the speed/thread count increase

veluca

	e9 lossy hit 8GB of RAM for around 20 seconds before I killed the process, streaming input resulted in this ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 9] Compressed to 4272.0 kB (1.030 bpp). 7680 x 4320, 1.125 MP/s [1.13, 1.13], 1 reps, 16 threads. PageFaultCount: 996673 PeakWorkingSetSize: 432.4 MiB QuotaPeakPagedPoolUsage: 225.1 KiB QuotaPeakNonPagedPoolUsage: 20.71 KiB PeakPagefileUsage: 436.3 MiB Creation time 2024/02/21 11:59:36.841 Exit time 2024/02/21 12:00:06.357 Wall time: 0 days, 00:00:29.515 (29.52 seconds) User time: 0 days, 00:01:19.984 (79.98 seconds) Kernel time: 0 days, 00:01:57.781 (117.78 seconds) ```
2024-02-21 12:02:43	... I am not sure why it worked at all
	In all lossless cases it actually scaled up to 100% CPU usage, although I assume you mean the speed/thread count increase
2024-02-21 12:03:09	yup
2024-02-21 12:03:19	could just be that the speed measurement is broken, of course

jonnyawsom3

	veluca ... I am not sure why it worked at all
2024-02-21 12:11:53	Well whatever happened, it seemed to work pretty well haha
2024-02-21 12:12:01	Ohh, wait...
2024-02-21 12:13:58	No, nevermind... I was thinking maybe patches and streaming disabling that, but streaming input is seperate so they're disabled anyway.... I think....
2024-02-21 01:16:56	I'm mostly looking forward to 0.10.0 getting into Squoosh, then I can test on my phone too without immediately hitting an out of memory error

Kremzli

2024-02-21 01:20:03

Do they even maintain that? It's got so many PR's

jonnyawsom3

2024-02-21 01:23:58

<@228116142185512960> might be able to shed some light on that

sklwmp

2024-02-22 04:45:06

from 2:30 to 30 seconds with cjxl v0.10.0, pretty significant improvements <:PepeOK:805388754545934396>

eddie.zato

2024-02-23 03:48:42

Ok. That's really impressive for the modular. <:Hypers:808826266060193874> ``` PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.jpg 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.jpg 0100j.jxl }).TotalSeconds } | ft JPEG XL encoder v0.9.2 41b8cda [AVX2] Encoding [JPEG, lossless transcode, effort: 7] Compressed to 13369.8 kB including container JPEG XL encoder v0.10.0 19bcd82 [AVX2] Encoding [JPEG, lossless transcode, effort: 7] Compressed to 13369.8 kB including container v092_s v0100_s ------ ------- 0,96 1,07 PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.png 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.png 0100j.jxl }).TotalSeconds } | ft JPEG XL encoder v0.9.2 41b8cda [AVX2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 7017.0 kB (1.128 bpp). 5760 x 8640, 2.985 MP/s [2.99, 2.99], 1 reps, 16 threads. JPEG XL encoder v0.10.0 19bcd82 [AVX2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 6963.2 kB (1.119 bpp). 5760 x 8640, 3.349 MP/s [3.35, 3.35], 1 reps, 16 threads. v092_s v0100_s ------ ------- 17,29 15,51 PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.png -m 1 -d 0 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.png -m 1 -d 0 0100j.jxl }).TotalSeconds } | ft JPEG XL encoder v0.9.2 41b8cda [AVX2] Encoding [Modular, lossless, effort: 7] Compressed to 36444.0 kB (5.858 bpp). 5760 x 8640, 0.423 MP/s [0.42, 0.42], 1 reps, 16 threads. JPEG XL encoder v0.10.0 19bcd82 [AVX2] Encoding [Modular, lossless, effort: 7] Compressed to 36064.1 kB (5.797 bpp). 5760 x 8640, 15.967 MP/s [15.97, 15.97], 1 reps, 16 threads. v092_s v0100_s ------ ------- 118,39 3,92 ```

Nyao-chan

Moving on to e3 ``` JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26824.0 kB (6.468 bpp). 7680 x 4320, 15.367 MP/s [15.37, 15.37], 1 reps, 16 threads. PeakWorkingSetSize: 2.173 GiB PeakPagefileUsage: 2.529 GiB Wall time: 0 days, 00:00:02.258 (2.26 seconds) User time: 0 days, 00:00:03.234 (3.23 seconds) Kernel time: 0 days, 00:00:12.031 (12.03 seconds) ``` Now 0.10.0 makes a difference without Streaming input ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26714.8 kB (6.442 bpp). 7680 x 4320, 20.714 MP/s [20.71, 20.71], 1 reps, 16 threads. PeakWorkingSetSize: 469 MiB PeakPagefileUsage: 525.6 MiB Wall time: 0 days, 00:00:01.704 (1.70 seconds) User time: 0 days, 00:00:04.906 (4.91 seconds) Kernel time: 0 days, 00:00:13.250 (13.25 seconds) ``` Streaming input makes roughly the same dent in memory usage as on e1 ``` JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2] Encoding [Modular, lossless, effort: 3] Compressed to 26714.8 kB (6.442 bpp). 7680 x 4320, 20.687 MP/s [20.69, 20.69], 1 reps, 16 threads. PeakWorkingSetSize: 373.9 MiB PeakPagefileUsage: 333.9 MiB Wall time: 0 days, 00:00:01.626 (1.63 seconds) User time: 0 days, 00:00:04.687 (4.69 seconds) Kernel time: 0 days, 00:00:12.406 (12.41 seconds) ```

2024-02-23 04:07:26

how did you toggle streaming?

jonnyawsom3

	Nyao-chan how did you toggle streaming?
2024-02-23 04:08:08	Streaming input is seperate to streaming encoding
2024-02-23 04:08:40	Although I mostly used the previous release (0.9) to measure with/without streaming encoding

Nyao-chan

2024-02-23 04:10:11	But you have 2 runs for 0.10.0, with and without streaming. I thought there are no flags to toggle it
2024-02-23 04:10:57	or is streaming input turned off by default and the flag enables it?

_wb_

2024-02-23 04:24:52

you can do `cjxl --streaming_input` but it requires ppm input atm — that will mostly impact memory though, not so much speed

jonnyawsom3

	Nyao-chan But you have 2 runs for 0.10.0, with and without streaming. I thought there are no flags to toggle it
2024-02-23 04:59:27	> Streaming input is seperate to streaming encoding
2024-02-23 05:00:24	All it does is feed the input image in a few pixels at a time, the only difference in those runs was around 200MB of RAM. The *actual* streaming was 0.9 versus either 0.10 runs

Orum

2024-02-27 09:23:07

so, it looks like `e 7` is the fastest cjxl speed that (on average) beats cwebp's `z 9`: ``` eal rbp rrt lvl cpu mem 1 cjxl v0.10.0 1 1.1997896 0.0371686 1 Inf 30657.19 2 cjxl v0.10.0 2 0.9755219 0.3651210 2 17.3375226 244023.45 3 cjxl v0.10.0 3 1.0254203 0.5425397 3 19.2356366 243855.98 4 cjxl v0.10.0 4 0.8880205 1.1001987 4 21.2395977 274834.79 5 cjxl v0.10.0 5 0.8284132 1.1617858 5 20.9716307 267952.03 6 cjxl v0.10.0 6 0.8102338 1.4223768 6 21.1074809 274365.39 7 cjxl v0.10.0 7 0.7574005 1.8606330 7 21.2205659 283921.00 8 cjxl v0.10.0 8 0.7391769 5.4232358 8 22.1395608 536870.36 9 cjxl v0.10.0 9 0.7178132 24.2512006 9 22.4154769 633623.88 10 cjxl v0.10.0 10 0.7209015 26.4382343 10 22.3001730 634870.07 11 cwebp 1.3.2 0 1.0000000 1.0000000 0 0.9691077 151040.92 12 cwebp 1.3.2 1 0.8719663 7.1135803 1 0.9942246 208628.52 13 cwebp 1.3.2 2 0.8479628 8.1897254 2 0.9948469 221764.80 14 cwebp 1.3.2 3 0.8219334 10.8388722 3 0.9956001 224679.29 15 cwebp 1.3.2 4 0.8211225 11.3688304 4 0.9960652 223148.79 16 cwebp 1.3.2 5 0.8190413 10.9181859 5 0.9961888 221839.46 17 cwebp 1.3.2 6 0.8163690 14.8374114 6 0.9964598 219571.72 18 cwebp 1.3.2 7 0.8149141 17.9221753 7 0.9971695 218806.12 19 cwebp 1.3.2 8 0.8082165 23.0884559 8 0.9975708 262045.31 20 cwebp 1.3.2 9 0.7912807 74.0762189 9 1.8664396 519069.50```

2024-02-27 09:24:19

which is good, because after that memory use goes up massively in cjxl, but `e 7` is still comparable to `z 8` (in mem use)

veluca

2024-02-27 09:24:43	what images (and OS/CPU) are you using?
2024-02-27 09:25:02	in particular that "cpu" column looks a bit weird
2024-02-27 09:25:36	ah, it's a "number of CPUs used"

Orum

2024-02-27 09:25:38

these are lossless screenshots I've taken from various games, running linux on a 7950X3D

veluca

2024-02-27 09:25:38

ok, I see

Orum

2024-02-27 09:26:43

it's a bit weird that the rbp is higher for effort 3 than 2... <:WhatThe:806133036059197491>

veluca

2024-02-27 09:27:29	that's nonphoto for you
2024-02-27 09:27:45	-e3 is tuned for photographic content

Orum

2024-02-27 09:27:53

okay, fair enough

username

2024-02-27 09:28:22

did you run cwebp with or without `-mt`? because `-z 9` benefits speed wise with it and produces the same final output

Orum

2024-02-27 09:29:02	all of those are with `-mt` (you wouldn't see 1.8 CPU use with `-z 9` without it)
2024-02-27 09:31:29	it's kind of crazy how much faster `e 10` is compared to `z 9` though, but it shows the power of good MT <:YEP:808828808127971399>
2024-02-27 09:32:31	anyway, I'll have some graphs in a moment, and some other stats that I need to investigate manually
2024-02-27 09:35:18	yeah, `e 1` is so fast that `time` doesn't really provide enough resolution of real time used to get a useful measurement, but everything else looks usable:
2024-02-27 09:37:36	being too fast is a good problem to have though <:Hypers:808826266060193874>
2024-02-27 09:41:05	peak memory use looks fairly comparable at most levels too, though cjxl's data is much more tightly grouped at some levels

veluca

2024-02-27 09:46:14

you can use `--num_reps`

Orum

2024-02-27 09:48:53	that's a lot more effort to script (if I just use it for `e 1`) or hours of additional testing (if I don't); but I think we can all agree that `e 1` lossless is "pretty damn fast" and certainly blows the pants off of cwebp `z 0`:
2024-02-27 09:49:25	of course, still take the e 1 numbers there with a boulder of salt as several of the images had a real time of '0'
2024-02-27 09:51:47	really interesting how `e 9` on average is smaller than `e 10`, but I assume that's again because the presets are tuned to photo?

veluca

2024-02-27 09:52:21	I'm not sure actually...
2024-02-27 09:52:41	how similar are different regions of your images?

Orum

2024-02-27 09:53:34	uhhh, that's extremely tough to say...
2024-02-27 09:54:02	it's a rather eclectic collection of screenshots, from > 100 different games

Nyao-chan

Orum so, it looks like `e 7` is the fastest cjxl speed that (on average) beats cwebp's `z 9`: ``` eal rbp rrt lvl cpu mem 1 cjxl v0.10.0 1 1.1997896 0.0371686 1 Inf 30657.19 2 cjxl v0.10.0 2 0.9755219 0.3651210 2 17.3375226 244023.45 3 cjxl v0.10.0 3 1.0254203 0.5425397 3 19.2356366 243855.98 4 cjxl v0.10.0 4 0.8880205 1.1001987 4 21.2395977 274834.79 5 cjxl v0.10.0 5 0.8284132 1.1617858 5 20.9716307 267952.03 6 cjxl v0.10.0 6 0.8102338 1.4223768 6 21.1074809 274365.39 7 cjxl v0.10.0 7 0.7574005 1.8606330 7 21.2205659 283921.00 8 cjxl v0.10.0 8 0.7391769 5.4232358 8 22.1395608 536870.36 9 cjxl v0.10.0 9 0.7178132 24.2512006 9 22.4154769 633623.88 10 cjxl v0.10.0 10 0.7209015 26.4382343 10 22.3001730 634870.07 11 cwebp 1.3.2 0 1.0000000 1.0000000 0 0.9691077 151040.92 12 cwebp 1.3.2 1 0.8719663 7.1135803 1 0.9942246 208628.52 13 cwebp 1.3.2 2 0.8479628 8.1897254 2 0.9948469 221764.80 14 cwebp 1.3.2 3 0.8219334 10.8388722 3 0.9956001 224679.29 15 cwebp 1.3.2 4 0.8211225 11.3688304 4 0.9960652 223148.79 16 cwebp 1.3.2 5 0.8190413 10.9181859 5 0.9961888 221839.46 17 cwebp 1.3.2 6 0.8163690 14.8374114 6 0.9964598 219571.72 18 cwebp 1.3.2 7 0.8149141 17.9221753 7 0.9971695 218806.12 19 cwebp 1.3.2 8 0.8082165 23.0884559 8 0.9975708 262045.31 20 cwebp 1.3.2 9 0.7912807 74.0762189 9 1.8664396 519069.50```

2024-02-27 10:30:23

how are you getting multi threading on `-e 10`?

Orum

2024-02-27 10:31:27

use both `--streaming_input` and `--streaming_output`

afed

2024-02-27 10:33:09

but it's basically `-e 9`

Nyao-chan

2024-02-27 10:33:36

that's kind of weird, since `-e 10` uses global optimisations, should it work with that?

Orum

2024-02-27 10:34:04	well in my case it was (usually) worse than e 9 🤷‍♂️
2024-02-27 10:34:41	so maybe something isn't working properly?

Nyao-chan

2024-02-27 10:35:00

how much worse, in kB?

afed

2024-02-27 10:36:39

`-e 9` is `-e 10` without streaming, though there were some changes later on

Nyao-chan

2024-02-27 10:37:20

I've noticed it's worse by around 100B with `--patches 1` so I wonder if that's what you are seeing

Orum

	Nyao-chan how much worse, in kB?
2024-02-27 10:37:59	uhh, not sure exactly; I tend to look at differences in %, not KB

Nyao-chan

	Nyao-chan I've noticed it's worse by around 100B with `--patches 1` so I wonder if that's what you are seeing
2024-02-27 10:40:50	and that was with a 1500x2250 image. on 10240x5760 it's 25 kB smaller (around 0.5%)

afed

	afed `-e 9` is `-e 10` without streaming, though there were some changes later on
2024-02-27 10:48:04	probably `--patches` are the difference because not optimized for multithreading/streaming?

Nyao-chan

2024-02-27 10:51:19

and the variable predictor will also be off, already merged

afed

2024-02-27 10:53:16

yeah, so `-e 9` in v0.10.1 will be much faster, but somewhat worse

Orum

2024-02-27 10:58:54

how will it compare to `e 8` then?

afed

2024-02-27 10:59:42

https://github.com/libjxl/libjxl/issues/3323

Orum

2024-02-27 11:01:04

`num_threads 0` <:monkaMega:809252622900789269>

afed

2024-02-27 11:03:43

for threading is the same, because with streaming it's almost independent individual blocks encoding but it's only for one manga image, so maybe for other images it will be different <:PepeSad:815718285877444619>

Orum

2024-02-27 11:03:47	anyway some separation between `e 9` and `e 10` is welcome as they're very close in speed right now
2024-02-27 11:04:54	as long as it doesn't overlap `e 8` too much <:KekDog:805390049033191445>

Nyao-chan

2024-02-27 11:05:28

I run everything single threaded because I thread per file. It's still faster

Orum

2024-02-27 11:05:46

the main reason I don't do that is memory use

Nyao-chan

	afed for threading is the same, because with streaming it's almost independent individual blocks encoding but it's only for one manga image, so maybe for other images it will be different <:PepeSad:815718285877444619>
2024-02-27 11:05:54	If you have a good corpus, tell me

afed

	Nyao-chan I run everything single threaded because I thread per file. It's still faster
2024-02-27 11:06:30	yeah, I also rarely needs multithreading for a single image

Nyao-chan

	Orum the main reason I don't do that is memory use
2024-02-27 11:06:36	even with 3000x4500 I can encode 16 images on 32 GB memory. thoug you have many more threads

Orum

2024-02-27 11:07:03

I'm usually working with 8K images, and sometimes even larger than that

Nyao-chan

2024-02-27 11:07:38

Yeah, when I encoded 10k I think 4 was the limit

Quackdoc

	Orum the main reason I don't do that is memory use
2024-02-27 11:07:44	with the new memory gains this might not be an issue anymore lol

Orum

2024-02-27 11:09:10

well it's less of an issue, but honestly I can fully utilize my CPU with only 2 simultaneous <:JXL:805850130203934781> encodes now, so that has a bigger effect on reducing memory than all the other optimizations combined (compared to running ~16+ encodes in the past, which wasn't even possible without running out of RAM)

afed

	afed yeah, I also rarely needs multithreading for a single image
2024-02-27 11:11:58	and for the best compression streaming still has efficiency loss, except for few cases when some algorithms don't work properly for the whole image
2024-02-27 11:17:50	about `-e 3`, its improved with palette detection but can be still worse than `-e 2` for non photos https://canary.discord.com/channels/794206087879852103/804324493420920833/1118243657456304189
2024-02-27 11:23:21	and it would be nice to have support for png streaming input, it's not that much gain as internal streaming, but still saves some more extra memory haven't checked, but does jpeg have streaming input?

Nyao-chan

2024-02-27 11:26:11	not according to the help message about the flag. but idk
2024-02-27 11:29:32	doesn't it reorder blocks and recompress Huffman coding in jpeg? would it make sense to stream at all?

afed

2024-02-27 11:31:05

for lossy recompression at least

Orum

2024-02-27 12:38:37

if anyone is interested, here's the sizes of the images at all compression levels (this doesn't have the timing or memory usage data, so if you want that too just ask)

yoochan

2024-02-27 12:51:30

could you share the images ? or their url ?

Orum

2024-02-27 12:54:38

right now they're all just on my NAS but I can upload if you're interested

yoochan

2024-02-27 12:55:13

what's the weight of the initial bundle ?

Orum

2024-02-27 12:55:33	1.9 GiB RN
2024-02-27 12:55:54	...which will take me quite some time to upload with how slow my upload speed is <:monkaMega:809252622900789269>

yoochan

2024-02-27 12:57:21

yep, don't 😄 you screenshoted them yourself ?

veluca

2024-02-27 07:28:38	latest commit on `main` should massively speed up lossy multithreading on windows
2024-02-27 07:29:00	(we found a tiny little mistake in an old commit that slowed things down on windows by a lot)

spider-mario

2024-02-27 07:32:00

(old being ~October, if I recall correctly?)

jonnyawsom3

2024-02-28 02:49:16

Got any numbers?

Traneptora

2024-02-28 04:03:32

-October being the gcc optimization level of "ctober"

jonnyawsom3

veluca latest commit on `main` should massively speed up lossy multithreading on windows

2024-02-28 02:20:17

```JPEG XL encoder v0.10.0 19bcd82 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 509.4 kB (0.491 bpp). 3840 x 2160, geomean: 1.199 MP/s [1.16, 1.23], 5 reps, 16 threads.``` ```JPEG XL encoder v0.10.0 3d75236 [AVX2,SSE2] Encoding [VarDCT, d1.000, effort: 7] Compressed to 509.4 kB (0.491 bpp). 3840 x 2160, geomean: 8.977 MP/s [8.76, 9.08], 5 reps, 16 threads.``` 7x improvement and from around 60% CPU usage to around 100%

veluca

2024-02-28 02:23:05	yep
2024-02-28 02:23:15	that's more or less what I'd expect

jonnyawsom3

2024-02-28 02:28:55

Did a random test on lossless since there was mention of trying to match speed a while ago ```JPEG XL encoder v0.10.0 3d75236 [AVX2,SSE2] Encoding [Modular, lossless, effort: 5] Compressed to 5508.2 kB (5.313 bpp). 3840 x 2160, geomean: 8.911 MP/s [8.75, 8.93], 5 reps, 16 threads.``` Apparently in this instance `-e 5` lossless almost perfectly matches lossy, at around 10x the filesize

Info

JPEG XL

General chat

Voice Channels

Archived

benchmarks