JPEG XL

2022-08-31 02:54:29	I get values like this with NLPD, maybe that's fine? ``` compressed/1082342/libjxl/e7_q30.png, 2.4478233626723522e-06 compressed/1082342/libjxl/e7_q40.png, 2.3875061287981225e-06 compressed/1082342/libjxl/e7_q50.png, 2.291733608217328e-06 compressed/1082342/libjxl/e7_q60.png, 2.180511273763841e-06 compressed/1082342/libjxl/e7_q65.png, 2.1342120817280374e-06 compressed/1082342/libjxl/e7_q70.png, 2.0811967260669917e-06 compressed/1082342/libjxl/e7_q75.png, 2.027705249929568e-06 compressed/1082342/libjxl/e7_q80.png, 1.969044888028293e-06 compressed/1082342/libjxl/e7_q85.png, 1.9079834601143375e-06 compressed/1082342/libjxl/e7_q90.png, 1.8571088276075898e-06 compressed/1082342/libjxl/e7_q95.png, 1.807375269891054e-06 ```
	BlueSwordM Any version of PSNR HVS or even PSNR HVS M.
2022-08-31 02:54:53	I was going to try that but I couldn't find an implementation that I can get to work easily

BlueSwordM

	_wb_ I was going to try that but I couldn't find an implementation that I can get to work easily
2022-08-31 02:56:26	Lucky for us, there is actually a Python program for us to use for PSNR HVS-M: https://pypi.org/project/psnr-hvsm/ I am still in awe how I managed to find it so quickly because I have been looking at it for days.

_wb_

2022-08-31 02:56:43	I don't want to get matlab stuff to work, and this one: https://github.com/t2ac32/PSNR-HVS-M-for-python doesn't seem to work anymore
2022-08-31 02:56:58	ah
2022-08-31 02:57:02	that one might work
2022-08-31 02:57:25	ERROR: Could not find a version that satisfies the requirement psnr-hvsm (from versions: none) ERROR: No matching distribution found for psnr-hvsm
2022-08-31 02:57:26	or not

BlueSwordM

2022-08-31 02:57:33	Oh god damn it, again.
	_wb_ or not
2022-08-31 02:58:41	Oh right right, I forgot about VQMT.
2022-08-31 02:58:50	https://github.com/rolinh/VQMT It only needs OpenCV, so it should be fine. Video only though, so YCbCr only.

_wb_

2022-08-31 02:59:11	yeah, might be because I'm using python 3.10 here and it wants 3.9, I don't feel like downgrading though
2022-08-31 02:59:45	ah nice, that looks like it's useful

BlueSwordM

2022-08-31 03:00:20

Yeah, but I forgot the V part: video only.

_wb_

2022-08-31 03:00:48	oh
2022-08-31 03:01:05	yuv only
2022-08-31 03:01:38	well I can get that to work but it's quite a bit more annoying than something that just takes png files as input

spider-mario

2022-08-31 03:02:23

I sense some `mktemp` coming

_wb_

2022-08-31 03:08:52	meh, not now
2022-08-31 03:09:12	``` img1 = tf.io.read_file(orig) img1 = tf.io.decode_image(img1, channels=1, dtype=tf.dtypes.float32) img2 = tf.io.read_file(dist) img2 = tf.io.decode_image(img2, channels=1, dtype=tf.dtypes.float32) dist01 = nlpd.nlpd(img1,img2) ```
2022-08-31 03:09:19	am I doing something wrong?
2022-08-31 03:10:35	this does give results but it correlates worse than psnr

veluca

2022-08-31 03:12:56

no idea

_wb_

2022-08-31 03:13:33	I don't have full results yet, this takes a while to compute
2022-08-31 03:17:03	either I'm doing something wrong or NLPD is just not a great metric
2022-08-31 03:17:59	FSIM seems to do quite well but it's taking ages to compute it, at least with the implementation I'm using.
2022-08-31 03:22:33	this is what I'm getting based on current partial results (about 1/4th of the data available)
2022-08-31 03:23:20	for pairwise comparisons, it looks like it is worse than anything else I tried so far
2022-08-31 03:25:06	for absolute quality assessment, it also looks like it is worse than anything else I tried so far, slightly worse than psnr...
2022-08-31 05:11:58	ok with full data, you could say NLPD is maybe a tiny bit better than PSNR
2022-08-31 05:12:13
2022-08-31 05:12:17	in absolute quality that is

BlueSwordM

	_wb_ well I can get that to work but it's quite a bit more annoying than something that just takes png files as input
2022-08-31 05:21:16	Here's an interesting 2007 study about PSNR-HVS-M vs something like MS-SSIM and the others: https://ponomarenko.info/vpqm07_p.pdf
2022-08-31 05:21:34	If we could have a good simple implementation of PSNR-HVS-M, it'd be interesting to compare it to SSIMU2.

_wb_

2022-08-31 05:31:38	It doesn't have to be a good implementation, just something that is easy to get working
2022-08-31 05:40:43	My impression is that most metrics have been trained/tuned on datasets like TID08/13 that mostly contain non-compression distortions (different kinds of noise etc), and at strong distortion intensity and in large steps - something like jpeg q5, q15, q30, q60 and q90

BlueSwordM

	_wb_ It doesn't have to be a good implementation, just something that is easy to get working
2022-08-31 05:47:30	In this context, a good implementation is one that is easy to work with.

_wb_

2022-08-31 05:50:37	for pairwise comparisons, NLPD is worse than anything else
2022-08-31 09:08:42	FSIM is taking ages to compute, but with almost 1/3rd of the data computed, so far it looks like for absolute quality estimation it's somewhere between VMAF and DSSIM, and for relative quality estimation somewhere between SSIMULACRA and SSIMULACRA 2. So that's quite good — now I wonder if it is inherently so glacially slow or if I'm just using a slow implementation
2022-08-31 09:09:19	(it takes about an hour to compute 1k scores, and I need to compute 22k of them)

veluca

2022-08-31 09:48:41

I'm taking a wild guess: you're running it on CPU, and/or on one image pair per process (assuming you're using a ml-ish implementation)

_wb_

2022-08-31 10:49:54	Yeah, should have at least used multiple cores, but oh well, going to sleep now and cannot be bothered
2022-08-31 10:50:39	I was assuming FSIM would be a cheapish one, it's not neural is it

BlueSwordM

	_wb_ I was assuming FSIM would be a cheapish one, it's not neural is it
2022-08-31 11:24:18	It isn't, but looking at the initial paper, no wonder it is quite slow.
2022-08-31 11:25:28	I'm sure the current Python implementations could be massively sped up in normal languages.

veluca

2022-09-01 05:47:31

tf is much much slower than direct hardware usage

_wb_

2022-09-01 05:55:28

most of the time seems to be spent in libfftw3 — which seems to be mostly doing lots of avx2 trigonometry stuff

veluca

2022-09-01 06:29:40	ah, I see
2022-09-01 06:30:11	so probably it recomputes the fft every time even if you have many images with the same size
2022-09-01 06:30:22	also I need to check what it needs big ffts for xD

fredomondi

2022-09-01 02:41:01

Not sure if this is the correct place to ask this....Does benchmark_xl tool work in Windows? Currently running it on MSYS2 and all I get is "illegal instruction" error message. All other tools work well

_wb_

2022-09-01 03:33:14	does it do that on any input?
2022-09-01 03:35:15	so FSIM results are in. In absolute quality, it's better than SSIM and SSIMULACRA and almost as good as VMAF (but worse than DSSIM and Butteraugli)
2022-09-01 03:36:25	in relative quality, it seems to be about as good as SSIMULACRA and DSSIM
	_wb_ So on our dataset (medium to very high quality classical image compression, i.e. the mozjpeg q30-q95 range), it looks like the different perceptual metrics can be ranked like this: For relative quality (predicting pairwise comparisons within the same original image): SSIMULACRA2 >> SSIMULACRA ~= DSSIM >> Butteraugli (2-norm) > PSNR > LPIPS >> VMAF >> SSIM For absolute quality (predicting MOS scores, i.e. metric values are consistent across different original images): SSIMULACRA2 >> Butteraugli (2-norm) > DSSIM > VMAF >> LPIPS ~= SSIMULACRA >> SSIM >> PSNR (with the caveat that SSIMULACRA2 was tuned based on part of this data, and while the results do generalize quite well to the validation set, they may not generalize to different kinds of inputs — e.g. other codecs, different image dimensions, different gamut or dynamic range, etc)
2022-09-01 03:38:31	to update this one: For relative quality (predicting pairwise comparisons within the same original image): SSIMULACRA2 >> SSIMULACRA ~= DSSIM ~= FSIM >> Butteraugli (2-norm) > PSNR > LPIPS >> VMAF >> SSIM >> NLPD For absolute quality (predicting MOS scores, i.e. metric values are consistent across different original images): SSIMULACRA2 >> Butteraugli (2-norm) > DSSIM > VMAF > FSIM >> LPIPS ~= SSIMULACRA >> SSIM >> PSNR ~= NLPD
2022-09-01 03:41:13	at least this is how I would rank them given how they correlate with the subjective data I have, which is specifically medium to very high quality and classical image compression (JPEG, JXL, J2K, WebP, AVIF, HEIC). Of course things are likely very different for different types of data (things like low fidelity AI-based codecs, capture artifacts, geometrical distortions, etc).
2022-09-01 03:47:01	looking at just the well-known PSNR, SSIM and VMAF metrics, I think it's quite interesting that PSNR >> VMAF >> SSIM for predicting pairwise opinion while VMAF >> SSIM >> PSNR for predicting MOS.
2022-09-01 03:48:31	obviously all three are quite poor in general for this specific task, and get things quite wrong quite often

Fraetor

2022-09-01 07:33:40	Could you combine the values you get from the different metrics to get something interesting?
2022-09-01 07:34:02	Like compensating for one metrics weaknesses in a certain area, or something.
2022-09-01 07:34:29	Or are they all refinements of each other, and thus you just want to use the best?

_wb_

2022-09-01 08:53:08

I think it's hard to combine them in a way that amplifies correct predictions more than wrong ones. But it could be interesting to explore...

Traneptora

2022-09-03 07:22:20	If your'e looking for some really fast FFTs, the ones in libavutil are significantly faster than everything else on the planet, including FFTW3
2022-09-03 07:22:38	Lynne hand-coded them in assembly and they're crazy fast

veluca

2022-09-03 07:27:22

fixed-size FFT and DCT do get quite fast indeed, there's a lot of different algorithms

_wb_

2022-09-07 04:56:29	https://sneyers.info/CID22/ interactive plots with the aggregate info of our big subjective eval
2022-09-07 04:57:09	MCOS 30 = low quality, 50 = medium quality, 70 = high quality, 90 = visually lossless
2022-09-07 05:47:35	Here's a somewhat arbitrary but imo relevant criterion for considering a new encoder/codec N substantially better than the old codec O: if the p10 worst performance of N consistently (across different corpora and across the quality spectrum) matches or beats the average performance of O, at similar encode speed, then N is substantially better than O.
2022-09-07 05:57:38	According to that criterion, libjxl e7 is substantially better than mozjpeg, and it's the only encoder we tested that can actually say that (even ignoring the aspect of encode speed)

Fraetor

2022-09-07 07:21:35

JPEG XL seems to almost meet that criteria for JPEG2000 and webp, only falling slightly behind (the 10% worst of JXL at least) at high qualities.

_wb_

2022-09-07 07:40:34

Yeah - and part of that could be not that relevant, at MCOS 85+ it's really very close to visually lossless (those two last points for jxl are d1 and d0.55), and probably the confidence intervals (not shown here) overlap there so arguably jxl p10 isn't really worse than j2k avg there. Compared to avg WebP, p10 jxl still has a problem in the two non-photo categories though (might be fixable but in at least libjxl 0.6.1, lossy non-photo is not so strong).

Fraetor

2022-09-07 07:44:58

Yeah, when you look at specific categories there are much clearer winners.

_wb_

2022-09-07 07:54:11	When you look at landscape/nature specifically, you could say jxl is substantially better than jpeg, j2k, webp and avif. For these type of images, it looks like j2k/webp/avif are performing quite disappointingly in the medium to high quality range, struggling to even match mozjpeg.
2022-09-07 07:55:48	I mean, if you look at the plots for those type of images, you could really say "jpeg is good enough" (until jxl arrived)

Fraetor

2022-09-07 08:15:01	I wasn't expecting HEIC to be so good though.
2022-09-07 08:16:49	JXL still eacks out a win on nature, but HEIC seems to win on anything unnatural.

_wb_

2022-09-07 08:44:01	this is HEIC at a speed that is 4 times slower than jxl e7 though
2022-09-07 08:44:15	but yes, x265 is not bad when configured properly
2022-09-07 08:45:13	at some point I may want to compare x265 HEIC encoding to whatever it is Apple does

Fraetor

2022-09-07 08:49:52

Ah, that is the trade off.

BlueSwordM

	Fraetor I wasn't expecting HEIC to be so good though.
2022-09-07 08:51:37	It's because x265 has an obscene setting when dealing with 4:4:4 content. It applies a +6 chroma QP offset when dealing with a 4:4:4 input... which is insane lmao.

_wb_

2022-09-07 09:57:13

this is 420 heic, I dunno if 444 heic is really worth doing considering Apple doesn't decode it afaik

BlueSwordM

2022-09-07 10:05:03

Oh, that's a bit different then. I'm really surprised.

_wb_

2022-09-07 10:41:16	actually it's even only twice as slow as jxl, so yes, it does quite well — patent encumbered though of course
2022-09-07 10:49:24	mostly in the lower range though - I consider MCOS 70-80 to be the most important range for still images on the web (high quality to about halfway to visually lossless, i.e. roughly the d2 to d3.5 range in jxl). In that range, for photo jxl wins while for non-photo heic and avif win.
2022-09-10 06:09:41	djxl only decodes jxl files, no idea what happened in that comment but it cannot be right...

The_Decryptor

2022-09-10 06:24:43	Could have just re-used the filename in an earlier cjxl call and copied the wrong decode command
2022-09-10 06:25:03	I know I've done "cjxl source.png source.png" enough times through tab completion

_wb_

2022-09-10 11:27:02	aggregated over just a small set of 50 small images, but it's an interesting way to visualize results
2022-09-10 11:27:22	https://sneyers.info/tradeoff-relative.html for the interactive version
2022-09-10 11:28:06	the screenshot shows the whole range from jxl q10 to jxl q96 in steps of 2
2022-09-10 11:28:51	you can nicely see the 'diminishing returns' effect as you go to lower and lower libaom speeds
2022-09-10 11:36:10	interestingly, according to ssimulacra2, at the high end, all codecs actually become worse than unoptimized jpeg — I think that's mostly because they need really high quality settings to match high-quality jpeg. According to ssimulacra2, webp q100 (but using the lossy codec) on average corresponds roughly to mozjpeg -revert -q 88 or to mozjpeg -q 92.
2022-09-10 11:38:01	so you get this trumpet shape that starts around jpeg q80 where everything starts giving less and less benefit compared to the old jpeg (at some point even going into negative benefit), except jxl which starts thriving
2022-09-10 11:42:30	also any claims of 60% improved compared to jpeg are exaggerated, avif can do that around libjpeg q20 but mozjpeg is already giving 30% improvement there so it depends a lot on whether you take unoptimized libjpeg as a baseline or default mozjpeg. And also, who cares about q20? You have to be into pretty aggressive compression to use anything below q50, imo.
2022-09-10 11:47:16	It's also interesting to see how all encoders except jxl get slower as quality gets higher. So both in speed as in compression gain (compared to simple libjpeg), all three major encoders (libwebp, mozjpeg and libaom) are significantly better at the very low end (where I guess it's the easiest to do something better than the blocky mess libjpeg produces there) than at the higher end.

JendaLinda

2022-09-10 12:29:09

Interestingly, the size of a JPEG file containing the exact same coefficients may be considerably different just depending if default or optimized Huffman tables are used or if it's progressive or not.

_wb_

2022-09-10 12:42:30	Yeah but mozjpeg is doing both kinds of things: better entropy coding, and quality-affecting things like trellis optimization
2022-09-10 12:43:15	It looks like the quality affecting stuff works best below q80 or so, but then starts being less effective and even negatively effective
2022-09-10 12:45:04	E.g. -fastcrush is only changing entropy coding, so it only has impact on speed and bpp but the encoded images are still the same with or without
2022-09-10 12:48:13	But the choice of quant tables, clamping-deringing, trellis optimization of AC and DC, etc are things that mozjpeg does and that affect quality for better or worse (but whether it's better or worse depends on what a human or a metric says)

JendaLinda

2022-09-10 12:59:18

That makes sense, more advanced encoders can surely improve the image quality. Those traditional quant tables were used for decades as a rule of thumb. It seems that photos are just pretty forgiving to lossy compression. The traditional JPEG encoder just discards some amount of data and it somehow works.

BlueSwordM

	_wb_ interestingly, according to ssimulacra2, at the high end, all codecs actually become worse than unoptimized jpeg — I think that's mostly because they need really high quality settings to match high-quality jpeg. According to ssimulacra2, webp q100 (but using the lossy codec) on average corresponds roughly to mozjpeg -revert -q 88 or to mozjpeg -q 92.
2022-09-10 04:11:50	To be fair, WebP is the worst case scenario since it can only do 4:2:0.

spider-mario

2022-09-10 04:23:15

it would be interesting to have guetzli in there, but then it’s also nice to have those results during this century

_wb_

2022-09-10 05:05:19	Haha
2022-09-10 05:05:42	I can try some guetzli and xyb jpeg later
2022-09-10 05:05:53	And the jxl recompressed versions of those
2022-09-11 10:44:53	results based on some more images: https://sneyers.info/trade-offs.html
2022-09-11 10:49:02	wow I forgot how crazy slow guetzli is, it's actually slower than aom s0

BlueSwordM

	_wb_ wow I forgot how crazy slow guetzli is, it's actually slower than aom s0
2022-09-11 03:24:45	How did you build guetzli?

_wb_

2022-09-11 03:24:57

apt install guetzli 🙂

BlueSwordM

	_wb_ apt install guetzli 🙂
2022-09-11 03:25:06	Oh ok 😄
2022-09-11 03:25:52	I thought you built it from source. For theoretical max performance, it'd be interesting to see how much more performance you could squeeze out from the binary with maximum level optimizations(`-O3 -march=native -flto`).
2022-09-11 03:27:03	Mainly because it doesn't seem to have any SIMD optimizations.

_wb_

2022-09-11 03:28:05	it can probably be sped up more significantly by using a cheaper variant of butteraugli in its inner loop and stuff like that
2022-09-11 03:29:16	guetzli seems to be messing weirdly with ssimulacra2 — which is likely more of a problem of ssimulacra2 than of guetzli...
2022-09-11 03:30:09	as in: for some images ssimulacra2 gives it very low scores and for others it gives it very high scores
2022-09-11 03:31:30	could be that I kind of overfit ssimulacra2 for the encoders it saw during training

BlueSwordM

2022-09-11 03:31:30

guetzli only encodes 4:4:4 JPEGs, right?

_wb_

2022-09-11 03:32:16

it also only does q84+, it's basically designed only for qualities around d1

spider-mario

2022-09-11 04:55:45	and non-progressive
2022-09-11 04:57:57	(although they can be made progressive, and often thereby smaller, after-the-fact using jpegtran)

_wb_

2022-09-11 05:00:15

oh, I didn't realize that. So it's really only about quality, not entropy coding?

spider-mario

2022-09-11 05:15:40	from what I recall, the non-progressivity was either for computational or user comfort reasons
2022-09-11 05:16:09	I suspect that the decision would plausibly be different now
2022-09-11 05:16:34	https://github.com/google/guetzli/issues/54#issuecomment-287415666
2022-09-11 05:18:20	(+ the quote from the readme in the report)

_wb_

2022-09-12 07:28:24	looks like ssimulacra2 for some reason completely hates guetzli with a passion and gives it really low scores on some images — I wonder what's going on there, I think this is likely caused by allowing negative weights in the parameter tuning I did for ssimulacra2. I think I may need to re-tune ssimulacra2 to make it more robust to different encoders even if correlating more poorly, because this really cannot be right
2022-09-12 07:29:09
2022-09-12 07:34:24	According to ssimulacra1, guetzli is 10-15% better than mozjpeg (in the q90+ range), but I don't trust ssimulacra1 either, it correlates relatively poorly and it says some weird things like webp m4 being better than m6 and avif444 being worse than avif420
2022-09-13 02:07:49	nevermind, I was doing something wrong with the input images so ssimulacra2 was miscomputed. I'll have to try again at some point
2022-09-16 10:28:07	https://sneyers.info/benchmarks/

fab

2022-09-16 12:23:35
2022-09-16 12:24:06	I've found trick to force higher bpp per space
2022-09-16 12:24:17	But at cost of more ringing
2022-09-16 12:25:12	Avif fans will not like this
2022-09-16 12:28:33	for %i in (D:\august\sept\uno\phonedue\Screenshots\*.jpg) do cjxl -d 0.663 -e 9 --dots=1 --gaborish=1 --epf=3 -I 66.17 --lossless_jpeg 0 "%i" "%i.jxl"
2022-09-16 12:28:56	Or this command that provides good quality
2022-09-16 12:32:38	For visual quality i prefer
2022-09-16 12:32:42	or %i in (D:\august\sept\uno\phonedue\Screenshots\*.jpg) do cjxl -d 0.743 -s 7 --dots=0 --gaborish=1 --epf=2 -p -I 64.3 --lossless_jpeg 0 "%i" "%i.jxl"
2022-09-16 12:37:02
2022-09-16 12:37:18	On this s9 is weak
2022-09-16 12:40:14	To me it seems good just that doesn't recognise that type of image
2022-09-16 12:40:43	It makes font larger in length that they are

_wb_

2022-09-19 07:25:09
2022-09-19 07:26:41	this plot covers the whole range from jpeg q25 to q100 (or about d8 to d0.5)
2022-09-19 07:27:57	as seen from the point of view of percentile 10 ssimulacra2 score of each encoder setting (x axis) versus average bpp saved compared to unoptimized jpeg (y axis)
2022-09-19 07:50:51	so what these plots say is the following: (assuming ssimulacra2 can be trusted, which is of course a big if) - jxl is 30 to 50% smaller than jpeg across the spectrum; - avif at s3-6 is 40% smaller than jpeg at the very low end, but 20% larger than jpeg at the very high end (q92+). At jpeg q75 it's still 20% smaller, around d1 it starts to become worse; - mozjpeg is 20% smaller than unoptimized jpeg at the very low end (q30-50), but the gap becomes smaller as quality goes up and around jpeg q80 the gap is gone and mozjpeg becomes worse than unoptimized jpeg (until q99 where it is better again) - webp (latest version at slowest speed) does perform a bit better than mozjpeg, about 5-10% smaller, but follows the same pattern of diminishing gains as quality goes up (and it cannot reach the highest qualities) - xyb-jpeg is a bit worse than mozjpeg at the low to medium range, but starting at around q75 it becomes better and in the q90+ range it's dramatically better (ignore the speed for xyb-jpeg, it's much faster but i'm using benchmark_xl to do the encoding and my script also counted the butteraugli computation it did) - avif s7 is not much better than webp, avif s8 is about the same as mozjpeg - speed-wise, all codecs except jxl and xyb-jpeg get significantly slower as quality goes up - even avif s8 is still slower than jxl e6 except at ultra-low quality (crossover is around jxl q25). The default avifenc speed of s6 seems to be well-chosen in the sense that slower settings provide little extra benefit at high cost and the faster settings are significantly worse.
2022-09-19 07:57:57	Most of all, this plot indicates that all the hyperbolic claims about new codecs need to be taken with a huge pile of salt. People have said that webp is 50% smaller than jpeg and avif is 50% smaller than webp but reality is quite a bit more nuanced than that and "good old jpeg" actually performs pretty good in the q75-95 range.

Orum

2022-09-20 02:01:50	yeah, webp is nowhere close to 50% smaller... but all marketing does stuff like that
2022-09-20 02:02:23	like VVC is supposed to be "50% smaller" than HEVC, and HEVC is supposed to be 50% smaller than AVC
2022-09-20 02:02:49	while maybe that is true at extremely low qualities, no one watches video at those levels (I hope)

Fox Wizard

2022-09-20 02:46:22	My dad does
2022-09-20 02:46:42	He downloads 1GB and lower 1080p movies and calls them "high quality"
2022-09-20 02:47:10	And yes, that includes basically every codec, but usually AVC... sometimes xvid XD

improver

2022-09-20 03:23:26

1GB is just not a good bandwidth for a proper movie, but it can end up surprisingly well watchable. with non-anime stuff, you won't even know what details you're missing

JendaLinda

2022-09-20 03:29:46

700MB movies were perfectly watchable. Although deluxe rips were split up to 3 CDs.

eddie.zato

2022-09-23 07:48:40

``` mozjpeg 4.1.2 (build 20220923) avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0) cjxl v0.8.0 16770a0 ```

Brinkie Pie

2022-09-23 08:18:36

that's intense. I'm curious about the parameters and file sizes.

eddie.zato

2022-09-23 08:37:15

`$q` is random between 87 and 97 ``` cjpeg.exe -quality $q -optimize avifenc.exe -a cq-level=(64 - 0.64*$q) -a end-usage=q -a color:sharpness=2 -a color:enable-chroma-deltaq=1 --min 0 --max 63 cjxl.exe -q $q cjxl.exe -q $q --gaborish=0 ```

fab

	eddie.zato ``` mozjpeg 4.1.2 (build 20220923) avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0) cjxl v0.8.0 16770a0 ```
2022-09-23 09:23:31	Aom latest is 3.5.0
2022-09-23 09:23:39	And there is difference

Brinkie Pie

2022-09-23 09:36:06

But that benchmark assumes that `avifenc`, mozjpeg and `cjxl` have the same understanding of a "97%" quality setting. There's no guarantee for this, so IMO another metric like the file size or a visual metric should be taken into consideration as well. Imagine I'd make my own mozjpeg-derivate `brinkiejpeg` which maps all quality settings to 98%-100%, it would have a clear advantage in this benchmark.

_wb_

2022-09-23 10:29:34	Also: do you decode to 8-bit png or 16-bit?
2022-09-23 10:30:46	But yes, most important is to make sure the range of filesizes (on the first generation) is similar

eddie.zato

2022-09-23 10:32:33

Yeah, it's not exactly a legitimate benchmark. I make a few of these mostly for fun and to see how jxl's "generation loss" improves with development. `djxl --bits_per_sample=16`

fab

2022-09-23 11:02:51	V 0.8.0 is not currrnt version
2022-09-23 11:03:01	The one is 0.7.0

_wb_

2022-09-23 05:46:45	vmaf is a funny metric
2022-09-23 05:47:04	it says this image: https://jon-cld.s3.amazonaws.com/test_images/016/mozjpeg-revert-q24.jpg is better than this image: https://jon-cld.s3.amazonaws.com/test_images/016/jxl-e6-q74.png
2022-09-23 05:48:42	it also says this image: https://jon-cld.s3.amazonaws.com/test_images/037/mozjpeg-revert-q18.jpg is better than this image: https://jon-cld.s3.amazonaws.com/test_images/037/jxl-e6-q54.png
2022-09-23 05:51:19	it also says that this: https://jon-cld.s3.amazonaws.com/test_images/008/mozjpeg-revert-q16.jpg is better than this: https://jon-cld.s3.amazonaws.com/test_images/008/avif-s7-q43.png
2022-09-23 06:00:48	https://twitter.com/jonsneyers/status/1573371624132419585?s=20
2022-09-23 06:01:37	This is just too ridiculous. How can it not be bothered by such hideous banding?
2022-09-23 07:37:22	tbh I'm also finding bugs in ssimulacra2
2022-09-23 07:39:29	i'll need to retrain it constraining the subscore weights to be non-negative, because with the negative weights I'm occasionally getting very weird results where on some images, when you push the quality down into the ridiculously low, the ssimulacra2 score starts to get higher again
2022-09-23 07:40:01	https://jon-cld.s3.amazonaws.com/test_images/036/avif-s7-q61.png e.g. it gives this image a pretty high score, which is obviously very wrong

Eugene Vert

	eddie.zato ``` mozjpeg 4.1.2 (build 20220923) avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0) cjxl v0.8.0 16770a0 ```
2022-09-23 08:04:16	Generation loss test with jxl/avif decoding to 16-bit png. There is still a little bit of color-shift and blocking with gaborish, but not that extreme)
2022-09-23 08:05:52	Quality setting is alternating between two. Source code is here: https://gist.github.com/EugeneVert/da60fbd403d9e1244fbde41233236d74

eddie.zato

2022-09-24 11:07:54	Ok, `cjpeg` uses random (85...95) quality settings, `cjxl` targets the jpeg file size in each generation. All intermediate pngs are 16-bit.
	Eugene Vert Quality setting is alternating between two. Source code is here: https://gist.github.com/EugeneVert/da60fbd403d9e1244fbde41233236d74
2022-09-24 11:26:09	I haven't tried `cjxl` with `patches=0`, maybe later.

_wb_

2022-09-24 02:07:24

I doubt it will use any patches on these images

2022-09-24 02:29:22

what is gaborish

_wb_

2022-09-24 02:52:08	Decode-side, it's just a 3x3 blurring convolution
2022-09-24 02:53:11	Encode-side, we first do the inverse of it before doing dct. End result is a bit less blocking.
2022-09-24 02:55:06	Probably the generation loss gets caused by the inverse gaborish not being fully accurate; the encoder uses a 5x5 approximation iirc

Jyrki Alakuijala

	eddie.zato Yeah, it's not exactly a legitimate benchmark. I make a few of these mostly for fun and to see how jxl's "generation loss" improves with development. `djxl --bits_per_sample=16`
2022-09-26 09:58:16	Thank you for doing this even when it is a bit embarrasing for me :-). We need to be aware of our weaknesses whatever they are -- and then work on them.
	w what is gaborish
2022-09-26 09:59:52	5x5 sharpening filter before encoding, 3x3 blurring filter after decoding
2022-09-26 10:00:28	it is about 99.9 % correct -- any blurring filter should have an infinitely big sharpening filter, so obviously 5x5 is an approximation
2022-09-26 10:01:34	we could make it to do a bit less sharpening at encoding, then these stripes will likely disappear

_wb_

2022-09-26 10:03:47

I suppose the error is towards 'slightly too sharp' now, which accumulates?

Jyrki Alakuijala

2022-09-26 10:27:20	a very simple way to adjust would be to change the 1.0 here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/gaborish.cc#L33
2022-09-26 10:27:51	I believe that increasing that 1.0 to a higher value, say 1.1, should end up sharpening less
2022-09-26 10:28:46	all the constants I searched through nelder-mead to find the mimimum error or best bpp*pnorm (don't remember)
	spider-mario https://github.com/google/guetzli/issues/54#issuecomment-287415666
2022-09-26 10:40:05	It was my mistake to not to make guetzli progressive -- I speced it to be sequential due to fears of progressive decoding being slower than libjpeg turbo use. I lived in a fantasyland where the jpeg decoding speed was much more important than it was in reality. If we do something like guetzli again, it will be 10 % better and something like 5-way progressive.
	_wb_ i'll need to retrain it constraining the subscore weights to be non-negative, because with the negative weights I'm occasionally getting very weird results where on some images, when you push the quality down into the ridiculously low, the ssimulacra2 score starts to get higher again
2022-09-26 10:42:07	+1 for less negative weights -- consider adding an internal cost for them and mix some of it to the objective that you are optimizing

_wb_

2022-09-26 10:44:18	i'm now just constraining things to only use positive weights for the subscores. I'm getting worse results on the training but on the validation set the difference is small.
2022-09-26 10:46:27	With negative weights: full set: Kendall: 0.750678 Pearson: 0.915953 MAE: 4.2119062897366675 validation set: K: 0.713692 P: 0.894972 MAE: 4.6921941837018535 Without negative weights: (still tuning atm so numbers might still get slightly better) full set: Kendall: 0.697231 Pearson: 0.865044 MAE: 5.260400245568066 validation set: K: 0.711085 P: 0.885026 MAE: 4.83869683016585
2022-09-26 10:48:02	The negative weights are just causing overfitting and they occasionally cause weird nonmonotonic behavior, for example: ``` #image,encoder,speed,q-setting,ssimulacra2-with-negative-weights-allowed,ssimulacra2-only-positive-weights images/002.png,avif,s7,q23,75.90065570,73.00392507 images/002.png,avif,s7,q25,73.88054718,70.26941217 images/002.png,avif,s7,q27,71.72160467,67.03692228 images/002.png,avif,s7,q29,69.10752095,64.58631157 images/002.png,avif,s7,q31,67.60556344,60.51143569 images/002.png,avif,s7,q33,65.76856998,56.44158147 images/002.png,avif,s7,q35,62.94440338,52.14308948 images/002.png,avif,s7,q37,61.66782141,46.38953710 images/002.png,avif,s7,q39,61.37867846,40.24145433 images/002.png,avif,s7,q41,64.37997448,32.39549149 images/002.png,avif,s7,q43,63.22384863,26.74709307 images/002.png,avif,s7,q45,65.74042586,20.77856709 ```

Jyrki Alakuijala

2022-09-26 10:49:58

If you can have ok results without negative weights, yes, of course that is better then

_wb_

2022-09-26 10:51:09	looking at the validation set numbers, the gap is small and not worth the unreliability
2022-09-26 10:52:52	I only discovered this because I was manually inspecting image pairs where different metrics disagree strongly — same as how I found out about vmaf being oblivious to banding
2022-09-26 10:57:24	I can give you some examples of where the new ssimulacra2 and butteraugli 3-norm disagree, if that helps to improve butteraugli
2022-09-26 10:58:24	orig: https://jon-cld.s3.amazonaws.com/test_images/reference/032.png A: https://jon-cld.s3.amazonaws.com/test_images/032/mozjpeg-revert-q26.jpg B: https://jon-cld.s3.amazonaws.com/test_images/032/avif-s6-q43.png Butteraugli 3-norm says A>B
2022-09-26 11:00:23	orig: https://jon-cld.s3.amazonaws.com/test_images/reference/011.png A: https://jon-cld.s3.amazonaws.com/test_images/011/mozjpeg-revert-q26.jpg B: https://jon-cld.s3.amazonaws.com/test_images/011/jxl-e6-q16.png
2022-09-26 11:01:42	orig: https://jon-cld.s3.amazonaws.com/test_images/reference/030.png A: https://jon-cld.s3.amazonaws.com/test_images/030/mozjpeg-revert-q30.jpg B: https://jon-cld.s3.amazonaws.com/test_images/030/aurora-faster-psycho-visual=0-q45.png
2022-09-26 11:03:04	it's typically cases where A has worse banding but B looks otherwise worse, so to some extent this is a matter of taste, I suppose.
2022-09-26 11:05:38	Ah, here is a different case. orig: https://jon-cld.s3.amazonaws.com/test_images/reference/022.png A: https://jon-cld.s3.amazonaws.com/test_images/022/avif-s7-q37.png B: https://jon-cld.s3.amazonaws.com/test_images/022/mozjpeg-q40.jpg
2022-09-26 11:06:28	btw chrome renders that untagged jpeg a bit differently (brighter) on my laptop than how it renders the pngs
2022-09-26 11:07:05	in safari I see no difference in color between the jpeg and the png, but in chrome the jpeg looks brighter
2022-09-26 11:08:25	<@604964375924834314> do you know what causes this? Is chrome perhaps interpreting untagged jpeg as rec709 instead of sRGB or something?
2022-09-26 11:10:18	Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/003.png A: https://jon-cld.s3.amazonaws.com/test_images/003/mozjpeg-revert-q24.jpg B: https://jon-cld.s3.amazonaws.com/test_images/003/jxl-e6-q16.png
2022-09-26 11:10:50	these are all cases where Butteraugli 3-norm says A is better than B while ssimulacra2-nonegativeweights says B is better than A.
2022-09-26 11:16:17	Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/001.png A: https://jon-cld.s3.amazonaws.com/test_images/001/mozjpeg-revert-q28.jpg B: https://jon-cld.s3.amazonaws.com/test_images/001/jxl-e6-q34.png
2022-09-26 11:21:42	Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/002.png A: https://jon-cld.s3.amazonaws.com/test_images/002/avif-s7-q41.png B: https://jon-cld.s3.amazonaws.com/test_images/002/aurora-faster-psycho-visual=0-q43.png
2022-09-26 11:24:07	There are also a few cases where I agree with Butteraugli 3-norm, but most of the time (including in the above ones) I tend to agree with ssimulacra2

Jyrki Alakuijala

2022-09-26 11:38:59	it can be insightful to compare such things with a distance map
2022-09-26 11:40:50	also, I use pairs like this to calibrate butteraugli when 'far away' from the JND
2022-09-26 11:41:25	then I can just look at the image with the 'a's and see if the shapes of observable isodifference contours are close to butteraugli heat maps
2022-09-26 11:42:31	in different colors (and gray of course, too)
2022-09-26 11:43:24	green
2022-09-26 11:44:16	the idea was to roughly approximate the xy, yb and xb high-frequency planes (the three squares)
2022-09-26 11:44:36	and see how those planes match with what the model is seeing

spider-mario

	_wb_ <@604964375924834314> do you know what causes this? Is chrome perhaps interpreting untagged jpeg as rec709 instead of sRGB or something?
2022-09-26 11:45:35	it’s the PNGs that are “wrong”, they have a gAMA chunk of 0.45455 but no sRGB chunk
2022-09-26 11:45:55	ImageMagick does that (https://github.com/ImageMagick/ImageMagick/issues/4375 )
2022-09-26 11:46:33	oh, right, they closed it as “completed” but I am not sure that they actually fixed it, I should look into it and maybe reopen

_wb_

2022-09-26 11:48:35

the thing is, in safari the png and the jpeg look the same, and they both look darker just like the png in chrome

spider-mario

2022-09-26 11:48:49	safari probably ignores the gAMA chunk
2022-09-26 11:49:05	oh, “just like”, I misread

_wb_

2022-09-26 11:49:20	yes, it's only the jpeg that looks different
2022-09-26 11:53:03	so is safari displaying the jpeg incorrectly then?

spider-mario

2022-09-26 12:01:39

possibly, could it be that it’s approximating the sRGB curve with a 2.2 gamma?

_wb_

2022-09-30 07:44:15	https://sneyers.info/tradeoffs/
2022-10-01 02:23:46	Added some scatter plots called `disagreement_*`
2022-10-01 02:23:50	e.g. https://sneyers.info/tradeoffs/disagreement_PSNR-Y_vs_SSIMULACRA_2_noneg.html
2022-10-01 02:24:28	every point here is a pair of images A,B where one metric says A>>B and the other says A<<B
2022-10-01 02:24:51	if you click on it it will open A,orig,B in three tabs (you have to disable the popup blocker on chrome for this)
2022-10-01 02:28:21	I had to limit it to show only the 1000 biggest disagreements per original, otherwise the thing became too heavy
2022-10-01 02:29:19	Looking at the disagreements is quite fun and it's nice to get examples of why e.g. PSNR is a crap metric

Traneptora

2022-10-03 01:43:13	do we have any benchmarks for lossless JXL as an intra-video codec
2022-10-03 01:43:38	since you gain minimal from lossless inter-prediction, I'm wondering how it could perform compared to something like, say, ffv1

veluca

2022-10-03 12:08:21

I don't think we do, but AFAIU the jxl context model is pretty much an extension of the ffv1 one so I'd be surprised if it's not better

_wb_

2022-10-03 05:13:26	Well ffv1 has some advantages: - adaptive chances (cabac) - ctx model is persistent across frames - no groups, so slight advantage for prediction (no poorly predicted first row and column per group)
2022-10-03 05:13:50	But jxl has a lot more advantages, not gonna list them
2022-10-04 10:21:32	https://github.com/Netflix/vmaf/issues/1102

BlueSwordM

2022-10-04 02:54:20

<@794205442175402004> Did you also test with VMAF_neg?

_wb_

2022-10-04 02:54:39

what's the command line for that?

BlueSwordM

	_wb_ what's the command line for that?
2022-10-04 02:55:56	VMAF_neg is a different model, so just specifying a different model path to that model will work.

_wb_

2022-10-04 02:56:05

is that a different --model? built-in?

BlueSwordM

	_wb_ is that a different --model? built-in?
2022-10-04 02:56:53	Are you using ffmpeg or the vmaf exec?

_wb_

2022-10-04 02:56:58

vmaf exec

BlueSwordM

	_wb_ vmaf exec
2022-10-04 02:58:20	You can find all the cmd arguments here: https://github.com/Netflix/vmaf/tree/master/libvmaf/tools
2022-10-04 02:58:49	```--model/-m $params: model parameters, colon ":" delimited `path=` path to model file `version=` built-in model version```
2022-10-04 02:59:59	vmaf neg will usually penalize such processing operations like gamma curve adjustments, contrast enhancement and sharpening.

_wb_

2022-10-04 03:01:43	so if I use `--model version=vmaf_v0.6.1neg` I do get different numbers
2022-10-04 03:03:31	orig vs orig: 97.424967 orig vs darker: 95.920890 orig vs jpg q60: 91.667490 orig vs darker jpg q60: 91.985217 orig vs jpg q75: 93.364518 orig vs darker jpg q75: 93.587509
2022-10-04 03:05:46	so it does appear to be more robust but still gives somewhat weird bonus points to images that are a bit too dark
2022-10-04 03:09:36	https://netflixtechblog.com/toward-a-better-quality-metric-for-the-video-community-7ed94e752a30 oh now I understand — default VMAF is attempting to be a metric that captures 'enhancement'?
2022-10-04 03:09:44	that explains a lot
2022-10-04 03:11:05	so it's not a fidelity metric but really an appeal metric where you can be better than the original by doing some sharpening, color adjustment and denoising

BlueSwordM

	_wb_ so it does appear to be more robust but still gives somewhat weird bonus points to images that are a bit too dark
2022-10-04 03:56:54	I see.
2022-10-04 04:04:56	It does penalize the changes, but gamma curve changes should be taken into account imo.

_wb_

2022-10-10 03:29:38

Original: https://jon-cld.s3.amazonaws.com/test/images/006.png Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg Almost all the metrics say B is better than A: VMAF-NEG: A=87.757061, B=90.774064 Butteraugli 3-norm: A=1.819012, B=1.726316 SSIMULACRA: A=0.06641718, B=0.06335570 DSSIM: A=0.00634723, B=0.00520672 PSNR-Y: A=31.551152, B=31.572802 PSNR-HVS: A=37.272210, B=39.058198 SSIM: A=0.996295, B=0.996939 MS-SSIM: A=0.98866, B=0.991059 Only one metric says the opposite: SSIMULACRA 2: A=62.92532865, B=56.85243698 What is your opinion? Which image do you prefer?

fab

2022-10-10 03:36:28	C
2022-10-10 03:36:31	C

_wb_

2022-10-10 03:37:35

obviously both are kind of low quality and of course everyone would prefer a higher quality image than this, but say you have to pick one

fab

2022-10-10 03:38:00	B with better flowers
2022-10-10 03:38:03	At right
2022-10-10 03:38:10	The one with yellow
2022-10-10 03:39:18	A seems less degraded
2022-10-10 03:39:23	Look at the roof
2022-10-10 03:39:50	Probably a
2022-10-10 03:42:17	On psnr y i agree with a
2022-10-10 03:43:42	B
2022-10-10 03:44:20	The grid at left looks better with b
2022-10-10 03:45:10	C Is higher quality
2022-10-10 03:45:23	I see c as higher quality
2022-10-10 03:47:31
2022-10-10 03:47:52	This Colour put cmyk at 4,0,12,43
2022-10-10 03:48:15	C hurts my eyes (for saturation)

BlueSwordM

_wb_ Original: https://jon-cld.s3.amazonaws.com/test/images/006.png Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg Almost all the metrics say B is better than A: VMAF-NEG: A=87.757061, B=90.774064 Butteraugli 3-norm: A=1.819012, B=1.726316 SSIMULACRA: A=0.06641718, B=0.06335570 DSSIM: A=0.00634723, B=0.00520672 PSNR-Y: A=31.551152, B=31.572802 PSNR-HVS: A=37.272210, B=39.058198 SSIM: A=0.996295, B=0.996939 MS-SSIM: A=0.98866, B=0.991059 Only one metric says the opposite: SSIMULACRA 2: A=62.92532865, B=56.85243698 What is your opinion? Which image do you prefer?

2022-10-10 03:48:45

B definitely, but I can see why ssimu2 prefers A. The image is sharper on the right, but there are a lot more artifacts, including ringing around edges and banding in the sky.

fab

2022-10-10 03:49:17	0.159 0,143 epf 3
	fab C hurts my eyes (for saturation)
2022-10-10 03:50:56	Of the Green
2022-10-10 03:52:01	I actually prefer the branches of trees at A
2022-10-10 03:52:31	For psnrhvs i agree on A
2022-10-10 03:58:12	Ok
2022-10-10 03:58:27	17:58:09
2022-10-10 03:58:35	17:58:26
2022-10-10 03:58:40	16:58:36

_wb_

	BlueSwordM B definitely, but I can see why ssimu2 prefers A. The image is sharper on the right, but there are a lot more artifacts, including ringing around edges and banding in the sky.
2022-10-10 03:59:46	I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.

fab

	_wb_ I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.
2022-10-10 04:00:53	To me i don't know why only Stefania scordio image look sharp on jxl
2022-10-10 04:01:10	If i do a normal Image of an animal
2022-10-10 04:01:16	Sometimes it misses nose
2022-10-10 04:01:32	I don't know if is normal functionamrnt
2022-10-10 04:01:50	Since october 2021 has been in this way
2022-10-10 04:02:15	Is not like it aims to do every cartoon perfectly
2022-10-10 04:02:20	Every Animals
2022-10-10 04:03:54	To me ssimulacra b is better
2022-10-10 04:07:36	Ssim c Is bad

BlueSwordM

	_wb_ I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.
2022-10-10 04:08:52	I somewhat disagree. From 1.5H monitor distance, outside of the banding, the B image looks better.

fab

2022-10-10 04:09:20	To me the psnr y of b is too high <@321486891079696385> do you agree
2022-10-10 04:10:14	Butteraugli should be 1.8175
2022-10-10 04:11:26	B psnr y 3483514
2022-10-10 04:12:00	Based on that data calibrate
2022-10-10 04:12:10	I don't know I'm not engineer

_wb_

2022-10-10 04:12:13

https://jon-cld.s3.amazonaws.com/test/distorted/008/mozjpeg-2x2-revert-q20.jpg VMAF=81.398802 https://jon-cld.s3.amazonaws.com/test/distorted/008/jxl-adc0-e6-q30.png VMAF=79.429024

fab

2022-10-10 04:13:19	Idw
	_wb_ https://jon-cld.s3.amazonaws.com/test/distorted/008/mozjpeg-2x2-revert-q20.jpg VMAF=81.398802 https://jon-cld.s3.amazonaws.com/test/distorted/008/jxl-adc0-e6-q30.png VMAF=79.429024
2022-10-10 04:14:03	The b is the result of all my points
2022-10-10 04:14:25	B psnr y 3483514

_wb_

2022-10-10 04:14:42

The thing with banding is that it's a very persistent artifact: even when looking from far away or in these downscaled discord previews you can still see it

fab

2022-10-10 04:35:28	A at center is too blurred wavy
2022-10-10 04:35:38	Out of focus
2022-10-10 04:52:20
2022-10-10 04:52:45	Are those valuable obtanaible with this image
2022-10-10 04:53:05	What is the strangest parameter
	_wb_ Original: https://jon-cld.s3.amazonaws.com/test/images/006.png Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg Almost all the metrics say B is better than A: VMAF-NEG: A=87.757061, B=90.774064 Butteraugli 3-norm: A=1.819012, B=1.726316 SSIMULACRA: A=0.06641718, B=0.06335570 DSSIM: A=0.00634723, B=0.00520672 PSNR-Y: A=31.551152, B=31.572802 PSNR-HVS: A=37.272210, B=39.058198 SSIM: A=0.996295, B=0.996939 MS-SSIM: A=0.98866, B=0.991059 Only one metric says the opposite: SSIMULACRA 2: A=62.92532865, B=56.85243698 What is your opinion? Which image do you prefer?
2022-10-10 04:53:14	This image
2022-10-10 04:53:27	From the original source input to jxl
2022-10-10 04:55:41	Id say vmaf 85.461
2022-10-10 04:56:01	And mssim 0.94256
2022-10-10 04:56:18	Psnr 0.30246
2022-10-10 04:56:35	Butteraugli 0.1602513
2022-10-10 04:57:54	Ssimulacra2 0.9045065486
2022-10-10 04:59:02	Ssim 0.08816430877
2022-10-10 04:59:48	Psnr hvs 0.3573946

_wb_

2022-10-10 05:01:04

Fab you are making very little sense, maybe you can start a forum thread about this instead of spamming this channel?

fab

2022-10-10 05:02:17	You right
2022-10-10 05:02:37	Remember eyes focus on green
2022-10-10 05:04:48	To me id like 31.5019486 on that Image of a notebook foe psnr
2022-10-10 05:04:56	Maybe psnr is a bit strong or underlooked
2022-10-10 05:07:45	ID like also further improvement from this
2022-10-10 05:09:04	To me this is too deringed enhancer
2022-10-10 05:09:12	Boh
2022-10-10 05:09:28	Sorry for sticker jon
2022-10-10 05:10:29	Ssimulacra of b Is optimum value to aim
2022-10-10 05:12:47	Then i would like distance 2.3
2022-10-10 05:13:08	Then you calibrate for every distance
2022-10-10 05:13:16	And post image at d1
2022-10-10 05:30:01	I'd like to see a 18.9% better mssim than b
2022-10-10 05:30:51	That's the quality when I want to see the Image
2022-10-10 05:34:36	Ssim not less than 0.881500877 at d 2.3
2022-10-10 05:35:14	Like 0.0014 less
2022-10-10 05:35:22	That max i accept
2022-10-10 05:36:10	B i want 91.77 vmaf
2022-10-10 05:42:41	Id say to make 1,3 1,8 less smooth
2022-10-10 05:42:50	Two Channel

improver

2022-10-10 06:19:57

fab have u ever tried nicotine (legit no ill intended question)

fab

2022-10-10 06:52:13

Are available new encodings?

Traneptora

	improver fab have u ever tried nicotine (legit no ill intended question)
2022-10-10 07:06:24	????

Nova Aurora

2022-10-10 07:08:27

Good old fabian

_wb_

2022-10-10 07:13:12	Here is what various metrics think about the filesize savings compared to plain libjpeg; encoder settings are aligned by 10th percentile worst-case and the range is libjpeg q20 to q98. Yellow line is libaom s6 tune=ssim 4:4:4, red line is libjxl 0.7, blue line is current git libjxl. Butteraugli 3-norm:
2022-10-10 07:14:01	DSSIM:
2022-10-10 07:14:51	CIEDE2000:
2022-10-10 07:16:40	MS-SSIM: (very strange, avif becomes worse than jpeg at libjpeg q>72)
2022-10-10 07:17:23	PSNR-HVS:
2022-10-10 07:18:07	PSNR-Y:
2022-10-10 07:19:29	SSIM:
2022-10-10 07:21:09	VMAF-NEG: (says both avif and jxl are worse than libjpeg-turbo when q>36)
2022-10-10 07:21:48	SSIMULACRA 1:
2022-10-10 07:22:24	SSIMULACRA 2:
2022-10-10 07:23:44	As you can see, how much you can save by using jxl or avif instead of jpeg depends a lot on which metric you ask.
2022-10-10 07:26:51	For some reason, VMAF(-NEG) really likes libjpeg-turbo. It claims that to reach a quality equivalent to libjpeg q80, you need to use libjxl q94 (which is 33% larger) or avif q15 (which is 16% larger).
2022-10-10 07:30:22	The only metrics (from the ones shown above) that make some amount of sense to me are <@532010383041363969>'s Butteraugli 3-norm, <@826537092669767691>'s DSSIM, my own SSIMULACRA (2 more than 1), and perhaps PSNR-Y (as a perceptually very crappy but at least not completely nonsensical metric). The others look just nuts to me.
2022-10-10 07:36:05	Whether my recent encode quality tweaks were actually an improvement or not depends on which metric you ask. For Butteraugli it's about the same at the high end and a 2% regression or so at the low end. For DSSIM it seems to be a 3-5% improvement across the spectrum. For SSIMULACRA 2 it's a 1-2% improvement at the high end, ~5% improvement at the low end.
2022-10-11 08:44:20	https://jon-cld.s3.amazonaws.com/test/disagreement_VMAF_vs_SSIMULACRA_2.html

Jyrki Alakuijala

	_wb_ SSIMULACRA 2:
2022-10-11 11:12:29	The fact that it is worse on SSIMULACRA 1 and much better on SSIMULACRA 2 indicates how important it is to keep improving metrics

_wb_

2022-10-11 11:18:20	Yes, there is still work to be done. Looking at metric disagreements can be useful to find failure cases. I just discovered a strange case where ssimulacra2 is still behaving non-monotonically. It's on this image: https://jon-cld.s3.amazonaws.com/test/images/1028637.png
2022-10-11 11:20:37	the curves for jxl and avif look normal, but the ones for libjpeg and mozjpeg show weird and strong oscillations, which is unexpected and might indicate a bug
2022-10-11 11:20:51	(a bug in ssimulacra2 that is)
2022-10-11 11:22:16	other metrics also get some oscillation on this image, but it's not as severe
2022-10-11 11:22:34

Jyrki Alakuijala

2022-10-11 12:06:50

non-monotonic can be understandable -- consider that quantization is more and less lucky at times

_wb_

2022-10-11 12:07:00	yeah but not to that extent
2022-10-11 12:07:10	I found the root cause

Jyrki Alakuijala

2022-10-11 12:07:12

agreed, that needs more work

_wb_

2022-10-11 12:07:22

it's a numerical issue

Jyrki Alakuijala

2022-10-11 12:07:33

-100 is a poor score 🙂

_wb_

2022-10-11 12:07:52	caused by the image having lots of black, which caused situations of dividing two near-zero numbers
2022-10-11 12:08:27	a silly numerical stability problem in the computation of the ssim map

Jyrki Alakuijala

2022-10-11 12:08:51	if you do xyb mapping properly, then you should end up in uniform perceptual space where no such divisions need to be made
2022-10-11 12:09:05	it should allow for superposition after that
2022-10-11 12:09:30	(at least the biased log compression in butteraugli's gamma)

_wb_

2022-10-11 12:11:03
2022-10-11 12:11:20	it's in that division. I picked too small values for c1 and/or c2

fab

2022-10-11 12:11:40	I think the image make it looks like the original is sharp
2022-10-11 12:11:47	Is unnatural
2022-10-11 12:11:58	With countrside images
2022-10-11 12:12:43	I used the 08102022 commit
2022-10-11 12:12:51	It's quality improvements
2022-10-11 12:13:09	But not naturalness

_wb_

	_wb_ it's in that division. I picked too small values for c1 and/or c2
2022-10-11 12:43:41	actually it's not really that. The real issue is that FastGaussian is a bit inaccurate, causing an image with only positive values to get slightly negative values after blurring. And that is obviously a problem when you then plug them in in the SSIM formula.

improver

2022-10-11 01:00:37

needs SlowAccurateGaussian

Jyrki Alakuijala

2022-10-11 01:01:33	I don't like those c1 and c2, they are non-sense like all of ssim, brain is not computing these things
2022-10-11 01:01:57	the division is wrong imho

_wb_

2022-10-11 01:16:48	that SSIM formula is made so 1 means perfect (orig = distorted), but for black pixels the mu_x and mu_y are zero and for perfectly flat regions the sigmas are zero, so the c1 and c2 are just to force the numerator and denominator to be nonzero. it's not elegant but it kind of works
2022-10-11 01:17:31	it does crucially require all pixel values to be >= 0 though, otherwise you get complete nonsense
2022-10-11 01:24:53	Adding some epsilon to Y solves the problem — then the FastGaussian inaccuracy doesn't cause values to become negative anymore

Jyrki Alakuijala

2022-10-11 01:42:05	what if you add a big epsilon like 1e6
2022-10-11 01:42:46	that of course flattens the dynamics -- you'd need to renormalize there

_wb_

2022-10-11 01:49:56

I'm adding 0.05, which is enough to counter the FastGaussian inaccuracy but it already changes scores a bit (like +3 or so). I'll retune the weights anyway because now with this bug out of the way maybe I can get a better fit

Jyrki Alakuijala

2022-10-11 01:52:34

try optimizing it for 0.05 and 1.0 (if 1.0 is the 80 nits or so)

Kornel

2022-10-11 03:14:28

When using various metrics tools, are you giving them all the same normalized PNG files? Can surprising results come from their own image decoders? Bad gamma or color profile handling.

_wb_

2022-10-11 03:16:31	yeah this did cause issues because I was using ImageMagick to produce pngs from ppm which gives them gAMA but no sRGB and all the colorspace confusion that causes
2022-10-11 03:17:35	so I'm rerunning everything, now with pngs that have have an explicit ICC profile that says sRGB and no gAMA so there is no way to get it wrong, I hope
2022-10-11 03:18:08	(i.e. tools that do properly handle colorspace will treat it as sRGB, and those that don't will hopefully also treat it as sRGB)
2022-10-11 03:21:02	the super confusing thing is that VMAF gives you bonus points if you get the colorspace wrong, so letting it compare a gAMA original with an sRGB decompressed image (both converted to y4m without any conversion besides the yuv matrix) results in higher scores than if you don't mess up the colorspaces
2022-10-11 03:21:51	see also: https://github.com/Netflix/vmaf/issues/1102
2022-10-11 03:22:06	anyway, that's apparently a feature, not a bug
2022-10-11 03:22:15	VMAF-NEG does not have this "feature", so I'm using that now
2022-10-11 03:22:31	but it's still a very bad metric
2022-10-11 03:25:53	it says mozjpeg -revert (so unoptimized libjpeg-turbo) is better than mozjpeg, avif and jxl at q>50
2022-10-11 03:33:37	And even at q<50, you can only save 10-20% over unoptimized libjpeg-turbo by using avif or jxl, according to vmaf-neg
2022-10-11 03:37:07	I don't know what went wrong there, but clearly something went wrong when they tuned vmaf. Perhaps they overfitted for the (video) encoders they used in their subjective testing?

BlueSwordM

	_wb_ I don't know what went wrong there, but clearly something went wrong when they tuned vmaf. Perhaps they overfitted for the (video) encoders they used in their subjective testing?
2022-10-11 03:39:45	For the 1080p model: x264 CRF 22-24, 3H monitor viewing distance, appeal > fidelity, 4 metric results, ML, etc. For the 4k model: X encoder, 1.5H monitor viewing distance, 4 metrics results, ML, etc.

_wb_

2022-10-11 03:46:41	appeal > fidelity is OK, but how can they say this image looks OK? https://jon-cld.s3.amazonaws.com/test/distorted/861443/mozjpeg-2x2-revert-q14.jpg
2022-10-11 03:47:14	that's VMAF-neg 80.1
2022-10-11 03:48:06	https://jon-cld.s3.amazonaws.com/test/distorted/861443/jxl-225b6884-pdc0-e6-q20.png
2022-10-11 03:48:16	this is VMAF-neg 79.1

BlueSwordM

	_wb_ https://jon-cld.s3.amazonaws.com/test/distorted/861443/jxl-225b6884-pdc0-e6-q20.png
2022-10-11 03:55:40	VMAF is not very sensitive to banding, which is why they made CAMBI separately. What does normal VMAF give you?

_wb_

	BlueSwordM VMAF is not very sensitive to banding, which is why they made CAMBI separately. What does normal VMAF give you?
2022-10-11 04:05:49	Default VMAF gives the first blocky mess a score of 83.702501 and the second image 80.415111

BlueSwordM

	_wb_ Default VMAF gives the first blocky mess a score of 83.702501 and the second image 80.415111
2022-10-11 04:06:12	That's very interesting. You should publish your results in the VMAF issue report.

_wb_

2022-10-11 04:06:48	CAMBI for the first image is 0.002237
2022-10-11 04:06:59	CAMBI for the second image is 8.714630
2022-10-11 04:07:28	CAMBI is more-is-worse so it claims the first has almost no banding and the second has a ton of banding
2022-10-11 04:08:43	these are not isolated cases, I don't know what they are doing but it gets things very wrong

improver

2022-10-11 04:43:16	"second one is just regular artifacts around moon. first one is a fully baked aesthetic, i dig it" t. CAMBI
2022-10-11 04:44:20	it's not that there aint bands on the second one, they're just spred out kinda

BlueSwordM

	improver it's not that there aint bands on the second one, they're just spred out kinda
2022-10-11 04:48:06	It might be that CAMBI tries to remove dithering to see the actual banding.
2022-10-11 06:46:46	<@794205442175402004> What's the lowest quality that Cloudinary uses in the worst case scenario possible for automatic image coding? I would like to have a lowest quality anchor to show the quality improvements at the lowest end for various standards and encoders.

_wb_

2022-10-11 06:47:41	Well people can specify q_1 if they want to, but of course nobody does that.
2022-10-11 06:48:04	For the automatic quality settings, we have q_auto:low at the lowest end

BlueSwordM

	_wb_ For the automatic quality settings, we have q_auto:low at the lowest end
2022-10-11 06:48:47	That would be mozjpeg Q30, right?

_wb_

2022-10-11 06:48:52

BlueSwordM

2022-10-11 06:48:52

Or 50, I'm not sure.

_wb_

2022-10-11 06:49:11

It's not a fixed q setting, it is image dependent

BlueSwordM

2022-10-11 06:49:31

Oh I see.

_wb_

2022-10-11 06:51:25	it corresponds to the average quality you get with mozjpeg q62, more or less
2022-10-11 06:52:31	q_auto:eco corresponds to the average quality you get with mozjpeg q72 or so q_auto:good corresponds to the average quality you get with mozjpeg q80 or so q_auto:best corresponds to the average quality you get with mozjpeg q90 or so

BlueSwordM

2022-10-11 06:52:39

Well, I guess mozjpeg q50-q55 as the lowest quality anchor was a decent choice on my end then. Thanks wb.

_wb_

2022-10-11 06:53:48	I estimate that <10% of our images delivered use q_auto:low, something like 35-40% each for :eco and :good, and 10-15% for :best
2022-10-11 07:37:58	many of the disagreements between vmaf-neg and ssimulacra2 are where there's horrible banding in one image: https://jon-cld.s3.amazonaws.com/test/disagreement_VMAF_vs_SSIMULACRA_2.html ssimulacra2 hates banding with a passion, vmaf-neg doesn't seem to care about it at all
2022-10-11 07:38:33	https://jon-cld.s3.amazonaws.com/test/distorted/239581/jxl-225b6884-pdc0-e6-q12.png https://jon-cld.s3.amazonaws.com/test/distorted/239581/mozjpeg-2x2-revert-q30.jpg
2022-10-11 07:39:42	ssimulacra2 says both are crappy (it gives the first 49, the second 46)
2022-10-11 07:40:11	vmaf says the first is crappy (67) but the second is quite good (89)

fab

2022-10-11 07:41:14

I'd say the first is quite good

_wb_

2022-10-11 07:41:30

nah, the macarons are very blurry in the first image

fab

2022-10-11 07:41:51

I don't know what are maccheroni

_wb_

2022-10-11 07:42:05

in the second image they're a bit better but that banding in the background is very bad

fab

2022-10-11 07:42:09

Biscuits?

_wb_

2022-10-11 07:42:26

https://en.wikipedia.org/wiki/Macaron

BlueSwordM

2022-10-11 07:46:23

<@794205442175402004> Something seems to be odd about this regarding Clang 15 vs Clang 14 performance in JXL that I just found from Phoronix: https://www.phoronix.com/news/LLVM-Clang-15-Benchmarks Somehow, cjxl e7 is over 25% faster with Clang 15 over Clang 14.

_wb_

2022-10-11 07:47:46	that's a surprisingly / suspiciously large difference
2022-10-11 07:48:44	unless clang devs have been specifically trying to do better on that particular benchmark, I find it a bit hard to believe

BlueSwordM

2022-10-11 07:50:28

Yeah. That's something rather strange, which is why I wanted to relay it here. I'll check on my end if I get similar numbers, because if I do, then I believe a hand written optimization could easily be written to take advantage of this speedup instead of leaving it to the compiler.

_wb_

2022-10-11 07:51:38

yes, please check if you can reproduce this

BlueSwordM

2022-10-11 08:04:05

Man, compiling compilers is quite the task for a CPU 🙂

Traneptora

	_wb_ vmaf says the first is crappy (67) but the second is quite good (89)
2022-10-11 08:55:01	the 2nd one preserves much more detail in the macarons
2022-10-11 08:55:40	the blurriness is undesirable
2022-10-11 08:56:36	I wonder if it's possible to tune the jxl encoder so it prioritizes details over smoothness, which appears to be what mozjpeg does and jxl prioritizes continuity over details
2022-10-11 08:56:44	like if you disabled EPF I wonder how that would affect the scores

_wb_

2022-10-11 09:05:04

note that that jxl is way lower quality and less bpp, you wouldn't use such a setting in practice

Traneptora

2022-10-11 09:15:36	oh wow that's q12
2022-10-11 09:15:39	that's ridiculously bad
2022-10-11 09:16:23	speaking of `-q` I thought we prefered to avoid that
2022-10-11 09:16:29	and use `-d` instead

_wb_

2022-10-11 09:18:23	I don't mind using either, as long as it's clear that it's not some kind of absolute notion of "quality percentage" that is the same in all encoders. Using -d helps to reduce that confusion, so it's pedagogically useful.
	Jyrki Alakuijala try optimizing it for 0.05 and 1.0 (if 1.0 is the 80 nits or so)
2022-10-12 07:59:41	Why a value as large as 1.0? That brings the range of Y to something like [1,2] instead of [0,1]. It will probably still work with proper renormalization/retuning, but it's much more than what's needed to solve the numerical inaccuracy issue.
2022-10-12 08:00:45	I used 0.05 for now, tuning hasn't fully converged yet but I'm currently here: ``` all: MAE: 5.144167334668228 Kendall: 0.708734 Pearson: 0.874347 validation: MAE: 4.760461265675695 K: 0.719678 P: 0.889105 ```
2022-10-12 08:02:50	the numbers for the current (no-negative-weights) ssimulacra2 are something like this: all: MAE:5.27, Kendall 0.69722, Pearson: 0.86514 validation: MAE: 4.85, Kendall: 0.71108, Pearson: 0.88503
2022-10-12 08:04:56	and the numbers for the first iteration of ssimulacra2 (with negative weights, which caused problems on images outside the tuning set) were like this: all: MAE: 4.274, Kendall: 0.74185, Pearson: 0.90244 validation: MAE: 4.470, Kendall: 0.72163, Pearson: 0.89504
2022-10-12 08:06:54	(you can see that the first iteration did some overfitting, it has better results for training than for validation while I'm quite sure that by random chance, the validation set is slightly 'easier' than the training set for SSIM-based metrics — the existing ones all get slightly better scores on the validation set than on the training set)
2022-10-12 08:09:33	(also the numbers for the first iteration cannot be exactly compared to those of the second/current iterations since they are comparing to a different version of the subjective data — we did a bit more data cleanup since then, which caused the overall range of scores to expand slightly, making it of course harder to get a low MAE)

fab

2022-10-12 08:11:18	Is similar to the numbers i mentioned
2022-10-12 08:11:27	So i think is good
2022-10-12 08:11:43	I forgot ssimulacra normal
2022-10-12 08:12:16	What is mae
2022-10-12 08:14:36	It improves sharpness of t shirt

_wb_

2022-10-12 08:15:21	mae is mean absolute error
2022-10-12 08:16:01	in this case between the metric score and the subjective mean opinion score
2022-10-12 08:17:07	so if the mean opinion score for an image is 80 (on a scale from 0 to 100), then ssimulacra2 gets within about +-5 of that on average
2022-10-12 08:18:12	which is not bad, the confidence intervals on those MOS scores are something like +- 3 anyway, so even a perfect metric would still have a MAE of 3 or so
2022-10-12 08:20:31	<@826537092669767691> I'm looking at cases where metrics disagree (obviously on most cases metrics will agree), e.g. here are disagreements between DSSIM and (the current version of, with a known bug that I'm fixing atm) SSIMULACRA 2: https://jon-cld.s3.amazonaws.com/test/disagreement_DSSIM_vs_SSIMULACRA_2.html
2022-10-12 08:21:21	this is very useful to find bugs or problematic edge cases
2022-10-12 09:01:49	Better perceptual metrics are crucial imo to improve the state of the art in lossy image compression. Otherwise we just end up making encoders that optimize for the wrong thing. I think that's the biggest thing that's wrong with AVIF: both in the bitstream design and now in the encoder optimization, they're looking at PSNR, SSIM and VMAF, and those are all bad metrics by themselves but even more so when you start optimizing for them — you can fool the metric and produce nice plots, but that doesn't mean the images are actually any good.

Jyrki Alakuijala

2022-10-12 09:06:54	VMAF got an Emmy recently, I think 2021
2022-10-12 09:42:49	it will not be easy to convince people -- who built VMAF as the most celebrated contribution of their whole careers -- to not to base their decisions on it
	Traneptora and use `-d` instead
2022-10-12 09:48:22	I love -d instead of -q, and having -d 1.0 as default. People are more likely to go to lower quality without understanding what they are doing if it is not multiples of JND (just noticeable differences)
	Traneptora I wonder if it's possible to tune the jxl encoder so it prioritizes details over smoothness, which appears to be what mozjpeg does and jxl prioritizes continuity over details
2022-10-12 09:50:35	AC vs. DC quantization balance can be adjusted -- new flatter quantization matrices can be computed -- or just (adaptively) sharpening the image slightly before compressing
	fab I don't know what are maccheroni
2022-10-12 09:53:44	in Zurich we call them luxemburgerli https://de.wikipedia.org/wiki/Luxemburgerli, a candidate name for the brotli-zopfli-guetzli etc. series

fab

	Jyrki Alakuijala AC vs. DC quantization balance can be adjusted -- new flatter quantization matrices can be computed -- or just (adaptively) sharpening the image slightly before compressing
2022-10-12 09:55:20	I say finally with new update Stefania scordio images look free from artifact as october 2021 from 321 kb to 109
2022-10-12 09:55:31	But they look boring mentally
2022-10-12 09:55:45	Like visually good
2022-10-12 09:55:57	Good greens maybe
2022-10-12 09:56:17	Bpp is good 0.581bpp
2022-10-12 09:56:44	Super boring it makes the day look longer
2022-10-12 09:56:47	For real
2022-10-12 09:57:07	They don't look like Memories
2022-10-12 09:57:40	So there is need to progress
2022-10-12 09:58:14	I said some input

Info

JPEG XL

General chat

Voice Channels

Archived

benchmarks