|
_wb_
|
2022-08-31 02:52:42
|
is it normal that nlpd produces numbers very close to zero?
|
|
|
BlueSwordM
|
|
_wb_
any other metrics I should try?
|
|
2022-08-31 02:53:40
|
Any version of PSNR HVS or even PSNR HVS M.
|
|
|
_wb_
|
2022-08-31 02:54:29
|
I get values like this with NLPD, maybe that's fine?
```
compressed/1082342/libjxl/e7_q30.png, 2.4478233626723522e-06
compressed/1082342/libjxl/e7_q40.png, 2.3875061287981225e-06
compressed/1082342/libjxl/e7_q50.png, 2.291733608217328e-06
compressed/1082342/libjxl/e7_q60.png, 2.180511273763841e-06
compressed/1082342/libjxl/e7_q65.png, 2.1342120817280374e-06
compressed/1082342/libjxl/e7_q70.png, 2.0811967260669917e-06
compressed/1082342/libjxl/e7_q75.png, 2.027705249929568e-06
compressed/1082342/libjxl/e7_q80.png, 1.969044888028293e-06
compressed/1082342/libjxl/e7_q85.png, 1.9079834601143375e-06
compressed/1082342/libjxl/e7_q90.png, 1.8571088276075898e-06
compressed/1082342/libjxl/e7_q95.png, 1.807375269891054e-06
```
|
|
|
BlueSwordM
Any version of PSNR HVS or even PSNR HVS M.
|
|
2022-08-31 02:54:53
|
I was going to try that but I couldn't find an implementation that I can get to work easily
|
|
|
BlueSwordM
|
|
_wb_
I was going to try that but I couldn't find an implementation that I can get to work easily
|
|
2022-08-31 02:56:26
|
Lucky for us, there is actually a Python program for us to use for PSNR HVS-M:
https://pypi.org/project/psnr-hvsm/
I am still in awe how I managed to find it so quickly because I have been looking at it for days.
|
|
|
_wb_
|
2022-08-31 02:56:43
|
I don't want to get matlab stuff to work, and this one: https://github.com/t2ac32/PSNR-HVS-M-for-python doesn't seem to work anymore
|
|
2022-08-31 02:56:58
|
ah
|
|
2022-08-31 02:57:02
|
that one might work
|
|
2022-08-31 02:57:25
|
ERROR: Could not find a version that satisfies the requirement psnr-hvsm (from versions: none)
ERROR: No matching distribution found for psnr-hvsm
|
|
2022-08-31 02:57:26
|
or not
|
|
|
BlueSwordM
|
2022-08-31 02:57:33
|
Oh god damn it, again.
|
|
|
_wb_
or not
|
|
2022-08-31 02:58:41
|
Oh right right, I forgot about VQMT.
|
|
2022-08-31 02:58:50
|
https://github.com/rolinh/VQMT
It only needs OpenCV, so it should be fine.
Video only though, so YCbCr only.
|
|
|
_wb_
|
2022-08-31 02:59:11
|
yeah, might be because I'm using python 3.10 here and it wants 3.9, I don't feel like downgrading though
|
|
2022-08-31 02:59:45
|
ah nice, that looks like it's useful
|
|
|
BlueSwordM
|
2022-08-31 03:00:20
|
Yeah, but I forgot the V part: video only.
|
|
|
_wb_
|
2022-08-31 03:00:48
|
oh
|
|
2022-08-31 03:01:05
|
yuv only
|
|
2022-08-31 03:01:38
|
well I can get that to work but it's quite a bit more annoying than something that just takes png files as input
|
|
|
spider-mario
|
2022-08-31 03:02:23
|
I sense some `mktemp` coming
|
|
|
_wb_
|
2022-08-31 03:08:52
|
meh, not now
|
|
2022-08-31 03:09:12
|
```
img1 = tf.io.read_file(orig)
img1 = tf.io.decode_image(img1, channels=1, dtype=tf.dtypes.float32)
img2 = tf.io.read_file(dist)
img2 = tf.io.decode_image(img2, channels=1, dtype=tf.dtypes.float32)
dist01 = nlpd.nlpd(img1,img2)
```
|
|
2022-08-31 03:09:19
|
am I doing something wrong?
|
|
2022-08-31 03:10:35
|
this does give results but it correlates worse than psnr
|
|
|
|
veluca
|
2022-08-31 03:12:56
|
no idea
|
|
|
_wb_
|
2022-08-31 03:13:33
|
I don't have full results yet, this takes a while to compute
|
|
2022-08-31 03:17:03
|
either I'm doing something wrong or NLPD is just not a great metric
|
|
2022-08-31 03:17:59
|
FSIM seems to do quite well but it's taking ages to compute it, at least with the implementation I'm using.
|
|
2022-08-31 03:22:33
|
this is what I'm getting based on current partial results (about 1/4th of the data available)
|
|
2022-08-31 03:23:20
|
for pairwise comparisons, it looks like it is worse than anything else I tried so far
|
|
2022-08-31 03:25:06
|
for absolute quality assessment, it also looks like it is worse than anything else I tried so far, slightly worse than psnr...
|
|
2022-08-31 05:11:58
|
ok with full data, you could say NLPD is maybe a tiny bit better than PSNR
|
|
2022-08-31 05:12:13
|
|
|
2022-08-31 05:12:17
|
in absolute quality that is
|
|
|
BlueSwordM
|
|
_wb_
well I can get that to work but it's quite a bit more annoying than something that just takes png files as input
|
|
2022-08-31 05:21:16
|
Here's an interesting 2007 study about PSNR-HVS-M vs something like MS-SSIM and the others:
https://ponomarenko.info/vpqm07_p.pdf
|
|
2022-08-31 05:21:34
|
If we could have a good simple implementation of PSNR-HVS-M, it'd be interesting to compare it to SSIMU2.
|
|
|
_wb_
|
2022-08-31 05:31:38
|
It doesn't have to be a good implementation, just something that is easy to get working
|
|
2022-08-31 05:40:43
|
My impression is that most metrics have been trained/tuned on datasets like TID08/13 that mostly contain non-compression distortions (different kinds of noise etc), and at strong distortion intensity and in large steps - something like jpeg q5, q15, q30, q60 and q90
|
|
|
BlueSwordM
|
|
_wb_
It doesn't have to be a good implementation, just something that is easy to get working
|
|
2022-08-31 05:47:30
|
In this context, a good implementation is one that is easy to work with.
|
|
|
_wb_
|
2022-08-31 05:50:37
|
for pairwise comparisons, NLPD is worse than anything else
|
|
2022-08-31 09:08:42
|
FSIM is taking ages to compute, but with almost 1/3rd of the data computed, so far it looks like for absolute quality estimation it's somewhere between VMAF and DSSIM, and for relative quality estimation somewhere between SSIMULACRA and SSIMULACRA 2. So that's quite good โ now I wonder if it is inherently so glacially slow or if I'm just using a slow implementation
|
|
2022-08-31 09:09:19
|
(it takes about an hour to compute 1k scores, and I need to compute 22k of them)
|
|
|
|
veluca
|
2022-08-31 09:48:41
|
I'm taking a wild guess: you're running it on CPU, and/or on one image pair per process (assuming you're using a ml-ish implementation)
|
|
|
_wb_
|
2022-08-31 10:49:54
|
Yeah, should have at least used multiple cores, but oh well, going to sleep now and cannot be bothered
|
|
2022-08-31 10:50:39
|
I was assuming FSIM would be a cheapish one, it's not neural is it
|
|
|
BlueSwordM
|
|
_wb_
I was assuming FSIM would be a cheapish one, it's not neural is it
|
|
2022-08-31 11:24:18
|
It isn't, but looking at the initial paper, no wonder it is quite slow.
|
|
2022-08-31 11:25:28
|
I'm sure the current Python implementations could be massively sped up in normal languages.
|
|
|
|
veluca
|
2022-09-01 05:47:31
|
tf is much much slower than direct hardware usage
|
|
|
_wb_
|
2022-09-01 05:55:28
|
most of the time seems to be spent in libfftw3 โ which seems to be mostly doing lots of avx2 trigonometry stuff
|
|
|
|
veluca
|
2022-09-01 06:29:40
|
ah, I see
|
|
2022-09-01 06:30:11
|
so probably it recomputes the fft every time even if you have many images with the same size
|
|
2022-09-01 06:30:22
|
also I need to check what it needs big ffts for xD
|
|
|
|
fredomondi
|
2022-09-01 02:41:01
|
Not sure if this is the correct place to ask this....Does benchmark_xl tool work in Windows? Currently running it on MSYS2 and all I get is "illegal instruction" error message. All other tools work well
|
|
|
_wb_
|
2022-09-01 03:33:14
|
does it do that on any input?
|
|
2022-09-01 03:35:15
|
so FSIM results are in. In absolute quality, it's better than SSIM and SSIMULACRA and almost as good as VMAF (but worse than DSSIM and Butteraugli)
|
|
2022-09-01 03:36:25
|
in relative quality, it seems to be about as good as SSIMULACRA and DSSIM
|
|
|
_wb_
So on our dataset (medium to very high quality classical image compression, i.e. the mozjpeg q30-q95 range), it looks like the different perceptual metrics can be ranked like this:
For relative quality (predicting pairwise comparisons within the same original image):
SSIMULACRA2 >> SSIMULACRA ~= DSSIM >> Butteraugli (2-norm) > PSNR > LPIPS >> VMAF >> SSIM
For absolute quality (predicting MOS scores, i.e. metric values are consistent across different original images):
SSIMULACRA2 >> Butteraugli (2-norm) > DSSIM > VMAF >> LPIPS ~= SSIMULACRA >> SSIM >> PSNR
(with the caveat that SSIMULACRA2 was tuned based on part of this data, and while the results do generalize quite well to the validation set, they may not generalize to different kinds of inputs โ e.g. other codecs, different image dimensions, different gamut or dynamic range, etc)
|
|
2022-09-01 03:38:31
|
to update this one:
For relative quality (predicting pairwise comparisons within the same original image):
SSIMULACRA2 >> SSIMULACRA ~= DSSIM ~= FSIM >> Butteraugli (2-norm) > PSNR > LPIPS >> VMAF >> SSIM >> NLPD
For absolute quality (predicting MOS scores, i.e. metric values are consistent across different original images):
SSIMULACRA2 >> Butteraugli (2-norm) > DSSIM > VMAF > FSIM >> LPIPS ~= SSIMULACRA >> SSIM >> PSNR ~= NLPD
|
|
2022-09-01 03:41:13
|
at least this is how I would rank them given how they correlate with the subjective data I have, which is specifically medium to very high quality and classical image compression (JPEG, JXL, J2K, WebP, AVIF, HEIC). Of course things are likely very different for different types of data (things like low fidelity AI-based codecs, capture artifacts, geometrical distortions, etc).
|
|
2022-09-01 03:47:01
|
looking at just the well-known PSNR, SSIM and VMAF metrics, I think it's quite interesting that PSNR >> VMAF >> SSIM for predicting pairwise opinion while VMAF >> SSIM >> PSNR for predicting MOS.
|
|
2022-09-01 03:48:31
|
obviously all three are quite poor in general for this specific task, and get things quite wrong quite often
|
|
|
Fraetor
|
2022-09-01 07:33:40
|
Could you combine the values you get from the different metrics to get something interesting?
|
|
2022-09-01 07:34:02
|
Like compensating for one metrics weaknesses in a certain area, or something.
|
|
2022-09-01 07:34:29
|
Or are they all refinements of each other, and thus you just want to use the best?
|
|
|
_wb_
|
2022-09-01 08:53:08
|
I think it's hard to combine them in a way that amplifies correct predictions more than wrong ones. But it could be interesting to explore...
|
|
|
Traneptora
|
2022-09-03 07:22:20
|
If your'e looking for some really fast FFTs, the ones in libavutil are significantly faster than everything else on the planet, including FFTW3
|
|
2022-09-03 07:22:38
|
Lynne hand-coded them in assembly and they're crazy fast
|
|
|
|
veluca
|
2022-09-03 07:27:22
|
fixed-size FFT and DCT do get quite fast indeed, there's a lot of different algorithms
|
|
|
_wb_
|
2022-09-07 04:56:29
|
https://sneyers.info/CID22/ interactive plots with the aggregate info of our big subjective eval
|
|
2022-09-07 04:57:09
|
MCOS 30 = low quality, 50 = medium quality, 70 = high quality, 90 = visually lossless
|
|
2022-09-07 05:47:35
|
Here's a somewhat arbitrary but imo relevant criterion for considering a new encoder/codec N substantially better than the old codec O: if the p10 worst performance of N consistently (across different corpora and across the quality spectrum) matches or beats the average performance of O, at similar encode speed, then N is substantially better than O.
|
|
2022-09-07 05:57:38
|
According to that criterion, libjxl e7 is substantially better than mozjpeg, and it's the only encoder we tested that can actually say that (even ignoring the aspect of encode speed)
|
|
|
Fraetor
|
2022-09-07 07:21:35
|
JPEG XL seems to almost meet that criteria for JPEG2000 and webp, only falling slightly behind (the 10% worst of JXL at least) at high qualities.
|
|
|
_wb_
|
2022-09-07 07:40:34
|
Yeah - and part of that could be not that relevant, at MCOS 85+ it's really very close to visually lossless (those two last points for jxl are d1 and d0.55), and probably the confidence intervals (not shown here) overlap there so arguably jxl p10 isn't really worse than j2k avg there. Compared to avg WebP, p10 jxl still has a problem in the two non-photo categories though (might be fixable but in at least libjxl 0.6.1, lossy non-photo is not so strong).
|
|
|
Fraetor
|
2022-09-07 07:44:58
|
Yeah, when you look at specific categories there are much clearer winners.
|
|
|
_wb_
|
2022-09-07 07:54:11
|
When you look at landscape/nature specifically, you could say jxl is substantially better than jpeg, j2k, webp and avif. For these type of images, it looks like j2k/webp/avif are performing quite disappointingly in the medium to high quality range, struggling to even match mozjpeg.
|
|
2022-09-07 07:55:48
|
I mean, if you look at the plots for those type of images, you could really say "jpeg is good enough" (until jxl arrived)
|
|
|
Fraetor
|
2022-09-07 08:15:01
|
I wasn't expecting HEIC to be so good though.
|
|
2022-09-07 08:16:49
|
JXL still eacks out a win on nature, but HEIC seems to win on anything unnatural.
|
|
|
_wb_
|
2022-09-07 08:44:01
|
this is HEIC at a speed that is 4 times slower than jxl e7 though
|
|
2022-09-07 08:44:15
|
but yes, x265 is not bad when configured properly
|
|
2022-09-07 08:45:13
|
at some point I may want to compare x265 HEIC encoding to whatever it is Apple does
|
|
|
Fraetor
|
2022-09-07 08:49:52
|
Ah, that is the trade off.
|
|
|
BlueSwordM
|
|
Fraetor
I wasn't expecting HEIC to be so good though.
|
|
2022-09-07 08:51:37
|
It's because x265 has an obscene setting when dealing with 4:4:4 content.
It applies a +6 chroma QP offset when dealing with a 4:4:4 input... which is insane lmao.
|
|
|
_wb_
|
2022-09-07 09:57:13
|
this is 420 heic, I dunno if 444 heic is really worth doing considering Apple doesn't decode it afaik
|
|
|
BlueSwordM
|
2022-09-07 10:05:03
|
Oh, that's a bit different then. I'm really surprised.
|
|
|
_wb_
|
2022-09-07 10:41:16
|
actually it's even only twice as slow as jxl, so yes, it does quite well โ patent encumbered though of course
|
|
2022-09-07 10:49:24
|
mostly in the lower range though - I consider MCOS 70-80 to be the most important range for still images on the web (high quality to about halfway to visually lossless, i.e. roughly the d2 to d3.5 range in jxl). In that range, for photo jxl wins while for non-photo heic and avif win.
|
|
2022-09-10 06:09:41
|
djxl only decodes jxl files, no idea what happened in that comment but it cannot be right...
|
|
|
The_Decryptor
|
2022-09-10 06:24:43
|
Could have just re-used the filename in an earlier cjxl call and copied the wrong decode command
|
|
2022-09-10 06:25:03
|
I know I've done "cjxl source.png source.png" enough times through tab completion
|
|
|
_wb_
|
2022-09-10 11:27:02
|
aggregated over just a small set of 50 small images, but it's an interesting way to visualize results
|
|
2022-09-10 11:27:22
|
https://sneyers.info/tradeoff-relative.html for the interactive version
|
|
2022-09-10 11:28:06
|
the screenshot shows the whole range from jxl q10 to jxl q96 in steps of 2
|
|
2022-09-10 11:28:51
|
you can nicely see the 'diminishing returns' effect as you go to lower and lower libaom speeds
|
|
2022-09-10 11:36:10
|
interestingly, according to ssimulacra2, at the high end, all codecs actually become worse than unoptimized jpeg โ I think that's mostly because they need really high quality settings to match high-quality jpeg. According to ssimulacra2, webp q100 (but using the lossy codec) on average corresponds roughly to mozjpeg -revert -q 88 or to mozjpeg -q 92.
|
|
2022-09-10 11:38:01
|
so you get this trumpet shape that starts around jpeg q80 where everything starts giving less and less benefit compared to the old jpeg (at some point even going into negative benefit), except jxl which starts thriving
|
|
2022-09-10 11:42:30
|
also any claims of 60% improved compared to jpeg are exaggerated, avif can do that around libjpeg q20 but mozjpeg is already giving 30% improvement there so it depends a lot on whether you take unoptimized libjpeg as a baseline or default mozjpeg. And also, who cares about q20? You have to be into pretty aggressive compression to use anything below q50, imo.
|
|
2022-09-10 11:47:16
|
It's also interesting to see how all encoders except jxl get slower as quality gets higher. So both in speed as in compression gain (compared to simple libjpeg), all three major encoders (libwebp, mozjpeg and libaom) are significantly better at the very low end (where I guess it's the easiest to do something better than the blocky mess libjpeg produces there) than at the higher end.
|
|
|
|
JendaLinda
|
2022-09-10 12:29:09
|
Interestingly, the size of a JPEG file containing the exact same coefficients may be considerably different just depending if default or optimized Huffman tables are used or if it's progressive or not.
|
|
|
_wb_
|
2022-09-10 12:42:30
|
Yeah but mozjpeg is doing both kinds of things: better entropy coding, and quality-affecting things like trellis optimization
|
|
2022-09-10 12:43:15
|
It looks like the quality affecting stuff works best below q80 or so, but then starts being less effective and even negatively effective
|
|
2022-09-10 12:45:04
|
E.g. -fastcrush is only changing entropy coding, so it only has impact on speed and bpp but the encoded images are still the same with or without
|
|
2022-09-10 12:48:13
|
But the choice of quant tables, clamping-deringing, trellis optimization of AC and DC, etc are things that mozjpeg does and that affect quality for better or worse (but whether it's better or worse depends on what a human or a metric says)
|
|
|
|
JendaLinda
|
2022-09-10 12:59:18
|
That makes sense, more advanced encoders can surely improve the image quality. Those traditional quant tables were used for decades as a rule of thumb. It seems that photos are just pretty forgiving to lossy compression. The traditional JPEG encoder just discards some amount of data and it somehow works.
|
|
|
BlueSwordM
|
|
_wb_
interestingly, according to ssimulacra2, at the high end, all codecs actually become worse than unoptimized jpeg โ I think that's mostly because they need really high quality settings to match high-quality jpeg. According to ssimulacra2, webp q100 (but using the lossy codec) on average corresponds roughly to mozjpeg -revert -q 88 or to mozjpeg -q 92.
|
|
2022-09-10 04:11:50
|
To be fair, WebP is the worst case scenario since it can only do 4:2:0.
|
|
|
spider-mario
|
2022-09-10 04:23:15
|
it would be interesting to have guetzli in there, but then itโs also nice to have those results during this century
|
|
|
_wb_
|
2022-09-10 05:05:19
|
Haha
|
|
2022-09-10 05:05:42
|
I can try some guetzli and xyb jpeg later
|
|
2022-09-10 05:05:53
|
And the jxl recompressed versions of those
|
|
2022-09-11 10:44:53
|
results based on some more images: https://sneyers.info/trade-offs.html
|
|
2022-09-11 10:49:02
|
wow I forgot how crazy slow guetzli is, it's actually slower than aom s0
|
|
|
BlueSwordM
|
|
_wb_
wow I forgot how crazy slow guetzli is, it's actually slower than aom s0
|
|
2022-09-11 03:24:45
|
How did you build guetzli?
|
|
|
_wb_
|
2022-09-11 03:24:57
|
apt install guetzli ๐
|
|
|
BlueSwordM
|
|
_wb_
apt install guetzli ๐
|
|
2022-09-11 03:25:06
|
Oh ok ๐
|
|
2022-09-11 03:25:52
|
I thought you built it from source.
For theoretical max performance, it'd be interesting to see how much more performance you could squeeze out from the binary with maximum level optimizations(`-O3 -march=native -flto`).
|
|
2022-09-11 03:27:03
|
Mainly because it doesn't seem to have any SIMD optimizations.
|
|
|
_wb_
|
2022-09-11 03:28:05
|
it can probably be sped up more significantly by using a cheaper variant of butteraugli in its inner loop and stuff like that
|
|
2022-09-11 03:29:16
|
guetzli seems to be messing weirdly with ssimulacra2 โ which is likely more of a problem of ssimulacra2 than of guetzli...
|
|
2022-09-11 03:30:09
|
as in: for some images ssimulacra2 gives it very low scores and for others it gives it very high scores
|
|
2022-09-11 03:31:30
|
could be that I kind of overfit ssimulacra2 for the encoders it saw during training
|
|
|
BlueSwordM
|
2022-09-11 03:31:30
|
guetzli only encodes 4:4:4 JPEGs, right?
|
|
|
_wb_
|
2022-09-11 03:32:16
|
it also only does q84+, it's basically designed only for qualities around d1
|
|
|
spider-mario
|
2022-09-11 04:55:45
|
and non-progressive
|
|
2022-09-11 04:57:57
|
(although they can be made progressive, and often thereby smaller, after-the-fact using jpegtran)
|
|
|
_wb_
|
2022-09-11 05:00:15
|
oh, I didn't realize that. So it's really only about quality, not entropy coding?
|
|
|
spider-mario
|
2022-09-11 05:15:40
|
from what I recall, the non-progressivity was either for computational or user comfort reasons
|
|
2022-09-11 05:16:09
|
I suspect that the decision would plausibly be different now
|
|
2022-09-11 05:16:34
|
https://github.com/google/guetzli/issues/54#issuecomment-287415666
|
|
2022-09-11 05:18:20
|
(+ the quote from the readme in the report)
|
|
|
_wb_
|
2022-09-12 07:28:24
|
looks like ssimulacra2 for some reason completely hates guetzli with a passion and gives it really low scores on some images โ I wonder what's going on there, I think this is likely caused by allowing negative weights in the parameter tuning I did for ssimulacra2. I think I may need to re-tune ssimulacra2 to make it more robust to different encoders even if correlating more poorly, because this really cannot be right
|
|
2022-09-12 07:29:09
|
|
|
2022-09-12 07:34:24
|
According to ssimulacra1, guetzli is 10-15% better than mozjpeg (in the q90+ range), but I don't trust ssimulacra1 either, it correlates relatively poorly and it says some weird things like webp m4 being better than m6 and avif444 being worse than avif420
|
|
2022-09-13 02:07:49
|
nevermind, I was doing something wrong with the input images so ssimulacra2 was miscomputed. I'll have to try again at some point
|
|
2022-09-16 10:28:07
|
https://sneyers.info/benchmarks/
|
|
|
fab
|
2022-09-16 12:23:35
|
|
|
2022-09-16 12:24:06
|
I've found trick to force higher bpp per space
|
|
2022-09-16 12:24:17
|
But at cost of more ringing
|
|
2022-09-16 12:25:12
|
Avif fans will not like this
|
|
2022-09-16 12:28:33
|
for %i in (D:\august\sept\uno\phonedue\Screenshots\*.jpg) do cjxl -d 0.663 -e 9 --dots=1 --gaborish=1 --epf=3 -I 66.17 --lossless_jpeg 0 "%i" "%i.jxl"
|
|
2022-09-16 12:28:56
|
Or this command that provides good quality
|
|
2022-09-16 12:32:38
|
For visual quality i prefer
|
|
2022-09-16 12:32:42
|
or %i in (D:\august\sept\uno\phonedue\Screenshots\*.jpg) do cjxl -d 0.743 -s 7 --dots=0 --gaborish=1 --epf=2 -p -I 64.3 --lossless_jpeg 0 "%i" "%i.jxl"
|
|
2022-09-16 12:37:02
|
|
|
2022-09-16 12:37:18
|
On this s9 is weak
|
|
2022-09-16 12:40:14
|
To me it seems good just that doesn't recognise that type of image
|
|
2022-09-16 12:40:43
|
It makes font larger in length that they are
|
|
|
_wb_
|
2022-09-19 07:25:09
|
|
|
2022-09-19 07:26:41
|
this plot covers the whole range from jpeg q25 to q100 (or about d8 to d0.5)
|
|
2022-09-19 07:27:57
|
as seen from the point of view of percentile 10 ssimulacra2 score of each encoder setting (x axis) versus average bpp saved compared to unoptimized jpeg (y axis)
|
|
2022-09-19 07:50:51
|
so what these plots say is the following: (assuming ssimulacra2 can be trusted, which is of course a big if)
- jxl is 30 to 50% smaller than jpeg across the spectrum;
- avif at s3-6 is 40% smaller than jpeg at the very low end, but 20% larger than jpeg at the very high end (q92+). At jpeg q75 it's still 20% smaller, around d1 it starts to become worse;
- mozjpeg is 20% smaller than unoptimized jpeg at the very low end (q30-50), but the gap becomes smaller as quality goes up and around jpeg q80 the gap is gone and mozjpeg becomes worse than unoptimized jpeg (until q99 where it is better again)
- webp (latest version at slowest speed) does perform a bit better than mozjpeg, about 5-10% smaller, but follows the same pattern of diminishing gains as quality goes up (and it cannot reach the highest qualities)
- xyb-jpeg is a bit worse than mozjpeg at the low to medium range, but starting at around q75 it becomes better and in the q90+ range it's dramatically better (ignore the speed for xyb-jpeg, it's much faster but i'm using benchmark_xl to do the encoding and my script also counted the butteraugli computation it did)
- avif s7 is not much better than webp, avif s8 is about the same as mozjpeg
- speed-wise, all codecs except jxl and xyb-jpeg get significantly slower as quality goes up
- even avif s8 is still slower than jxl e6 except at ultra-low quality (crossover is around jxl q25). The default avifenc speed of s6 seems to be well-chosen in the sense that slower settings provide little extra benefit at high cost and the faster settings are significantly worse.
|
|
2022-09-19 07:57:57
|
Most of all, this plot indicates that all the hyperbolic claims about new codecs need to be taken with a huge pile of salt. People have said that webp is 50% smaller than jpeg and avif is 50% smaller than webp but reality is quite a bit more nuanced than that and "good old jpeg" actually performs pretty good in the q75-95 range.
|
|
|
Orum
|
2022-09-20 02:01:50
|
yeah, webp is nowhere close to 50% smaller... but all marketing does stuff like that
|
|
2022-09-20 02:02:23
|
like VVC is supposed to be "50% smaller" than HEVC, and HEVC is supposed to be 50% smaller than AVC
|
|
2022-09-20 02:02:49
|
while maybe that is true at extremely low qualities, no one watches video at those levels (I hope)
|
|
|
Fox Wizard
|
2022-09-20 02:46:22
|
My dad does
|
|
2022-09-20 02:46:42
|
He downloads 1GB and lower 1080p movies and calls them "high quality"
|
|
2022-09-20 02:47:10
|
And yes, that includes basically every codec, but usually AVC... sometimes xvid XD
|
|
|
improver
|
2022-09-20 03:23:26
|
1GB is just not a good bandwidth for a proper movie, but it can end up surprisingly well watchable. with non-anime stuff, you won't even know what details you're missing
|
|
|
|
JendaLinda
|
2022-09-20 03:29:46
|
700MB movies were perfectly watchable. Although deluxe rips were split up to 3 CDs.
|
|
|
eddie.zato
|
2022-09-23 07:48:40
|
```
mozjpeg 4.1.2 (build 20220923)
avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0)
cjxl v0.8.0 16770a0
```
|
|
|
Brinkie Pie
|
2022-09-23 08:18:36
|
that's intense. I'm curious about the parameters and file sizes.
|
|
|
eddie.zato
|
2022-09-23 08:37:15
|
`$q` is random between 87 and 97
```
cjpeg.exe -quality $q -optimize
avifenc.exe -a cq-level=(64 - 0.64*$q) -a end-usage=q -a color:sharpness=2 -a color:enable-chroma-deltaq=1 --min 0 --max 63
cjxl.exe -q $q
cjxl.exe -q $q --gaborish=0
```
|
|
|
fab
|
|
eddie.zato
```
mozjpeg 4.1.2 (build 20220923)
avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0)
cjxl v0.8.0 16770a0
```
|
|
2022-09-23 09:23:31
|
Aom latest is 3.5.0
|
|
2022-09-23 09:23:39
|
And there is difference
|
|
|
Brinkie Pie
|
2022-09-23 09:36:06
|
But that benchmark assumes that `avifenc`, mozjpeg and `cjxl` have the same understanding of a "97%" quality setting. There's no guarantee for this, so IMO another metric like the file size or a visual metric should be taken into consideration as well. Imagine I'd make my own mozjpeg-derivate `brinkiejpeg` which maps all quality settings to 98%-100%, it would have a clear advantage in this benchmark.
|
|
|
_wb_
|
2022-09-23 10:29:34
|
Also: do you decode to 8-bit png or 16-bit?
|
|
2022-09-23 10:30:46
|
But yes, most important is to make sure the range of filesizes (on the first generation) is similar
|
|
|
eddie.zato
|
2022-09-23 10:32:33
|
Yeah, it's not exactly a legitimate benchmark. I make a few of these mostly for fun and to see how jxl's "generation loss" improves with development.
`djxl --bits_per_sample=16`
|
|
|
fab
|
2022-09-23 11:02:51
|
V 0.8.0 is not currrnt version
|
|
2022-09-23 11:03:01
|
The one is 0.7.0
|
|
|
_wb_
|
2022-09-23 05:46:45
|
vmaf is a funny metric
|
|
2022-09-23 05:47:04
|
it says this image: https://jon-cld.s3.amazonaws.com/test_images/016/mozjpeg-revert-q24.jpg
is better than this image: https://jon-cld.s3.amazonaws.com/test_images/016/jxl-e6-q74.png
|
|
2022-09-23 05:48:42
|
it also says this image: https://jon-cld.s3.amazonaws.com/test_images/037/mozjpeg-revert-q18.jpg
is better than this image: https://jon-cld.s3.amazonaws.com/test_images/037/jxl-e6-q54.png
|
|
2022-09-23 05:51:19
|
it also says that this: https://jon-cld.s3.amazonaws.com/test_images/008/mozjpeg-revert-q16.jpg
is better than this: https://jon-cld.s3.amazonaws.com/test_images/008/avif-s7-q43.png
|
|
2022-09-23 06:00:48
|
https://twitter.com/jonsneyers/status/1573371624132419585?s=20
|
|
2022-09-23 06:01:37
|
This is just too ridiculous. How can it not be bothered by such hideous banding?
|
|
2022-09-23 07:37:22
|
tbh I'm also finding bugs in ssimulacra2
|
|
2022-09-23 07:39:29
|
i'll need to retrain it constraining the subscore weights to be non-negative, because with the negative weights I'm occasionally getting very weird results where on some images, when you push the quality down into the ridiculously low, the ssimulacra2 score starts to get higher again
|
|
2022-09-23 07:40:01
|
https://jon-cld.s3.amazonaws.com/test_images/036/avif-s7-q61.png e.g. it gives this image a pretty high score, which is obviously very wrong
|
|
|
Eugene Vert
|
|
eddie.zato
```
mozjpeg 4.1.2 (build 20220923)
avifenc 0.10.1 (dav1d [dec]:1.0.0-0-g99172b1, aom [enc/dec]:3.4.0)
cjxl v0.8.0 16770a0
```
|
|
2022-09-23 08:04:16
|
Generation loss test with jxl/avif decoding to 16-bit png. There is still a little bit of color-shift and blocking with gaborish, but not that extreme)
|
|
2022-09-23 08:05:52
|
Quality setting is alternating between two. Source code is here: https://gist.github.com/EugeneVert/da60fbd403d9e1244fbde41233236d74
|
|
|
eddie.zato
|
2022-09-24 11:07:54
|
Ok, `cjpeg` uses random (85...95) quality settings, `cjxl` targets the jpeg file size in each generation. All intermediate pngs are 16-bit.
|
|
|
Eugene Vert
Quality setting is alternating between two. Source code is here: https://gist.github.com/EugeneVert/da60fbd403d9e1244fbde41233236d74
|
|
2022-09-24 11:26:09
|
I haven't tried `cjxl` with `patches=0`, maybe later.
|
|
|
_wb_
|
2022-09-24 02:07:24
|
I doubt it will use any patches on these images
|
|
|
w
|
2022-09-24 02:29:22
|
what is gaborish
|
|
|
_wb_
|
2022-09-24 02:52:08
|
Decode-side, it's just a 3x3 blurring convolution
|
|
2022-09-24 02:53:11
|
Encode-side, we first do the inverse of it before doing dct. End result is a bit less blocking.
|
|
2022-09-24 02:55:06
|
Probably the generation loss gets caused by the inverse gaborish not being fully accurate; the encoder uses a 5x5 approximation iirc
|
|
|
Jyrki Alakuijala
|
|
eddie.zato
Yeah, it's not exactly a legitimate benchmark. I make a few of these mostly for fun and to see how jxl's "generation loss" improves with development.
`djxl --bits_per_sample=16`
|
|
2022-09-26 09:58:16
|
Thank you for doing this even when it is a bit embarrasing for me :-). We need to be aware of our weaknesses whatever they are -- and then work on them.
|
|
|
w
what is gaborish
|
|
2022-09-26 09:59:52
|
5x5 sharpening filter before encoding, 3x3 blurring filter after decoding
|
|
2022-09-26 10:00:28
|
it is about 99.9 % correct -- any blurring filter should have an infinitely big sharpening filter, so obviously 5x5 is an approximation
|
|
2022-09-26 10:01:34
|
we could make it to do a bit less sharpening at encoding, then these stripes will likely disappear
|
|
|
_wb_
|
2022-09-26 10:03:47
|
I suppose the error is towards 'slightly too sharp' now, which accumulates?
|
|
|
Jyrki Alakuijala
|
2022-09-26 10:27:20
|
a very simple way to adjust would be to change the 1.0 here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/gaborish.cc#L33
|
|
2022-09-26 10:27:51
|
I believe that increasing that 1.0 to a higher value, say 1.1, should end up sharpening less
|
|
2022-09-26 10:28:46
|
all the constants I searched through nelder-mead to find the mimimum error or best bpp*pnorm (don't remember)
|
|
|
spider-mario
https://github.com/google/guetzli/issues/54#issuecomment-287415666
|
|
2022-09-26 10:40:05
|
It was my mistake to not to make guetzli progressive -- I speced it to be sequential due to fears of progressive decoding being slower than libjpeg turbo use. I lived in a fantasyland where the jpeg decoding speed was much more important than it was in reality. If we do something like guetzli again, it will be 10 % better and something like 5-way progressive.
|
|
|
_wb_
i'll need to retrain it constraining the subscore weights to be non-negative, because with the negative weights I'm occasionally getting very weird results where on some images, when you push the quality down into the ridiculously low, the ssimulacra2 score starts to get higher again
|
|
2022-09-26 10:42:07
|
+1 for less negative weights -- consider adding an internal cost for them and mix some of it to the objective that you are optimizing
|
|
|
_wb_
|
2022-09-26 10:44:18
|
i'm now just constraining things to only use positive weights for the subscores. I'm getting worse results on the training but on the validation set the difference is small.
|
|
2022-09-26 10:46:27
|
With negative weights:
full set: Kendall: 0.750678 Pearson: 0.915953 MAE: 4.2119062897366675
validation set: K: 0.713692 P: 0.894972 MAE: 4.6921941837018535
Without negative weights: (still tuning atm so numbers might still get slightly better)
full set: Kendall: 0.697231 Pearson: 0.865044 MAE: 5.260400245568066
validation set: K: 0.711085 P: 0.885026 MAE: 4.83869683016585
|
|
2022-09-26 10:48:02
|
The negative weights are just causing overfitting and they occasionally cause weird nonmonotonic behavior, for example:
```
#image,encoder,speed,q-setting,ssimulacra2-with-negative-weights-allowed,ssimulacra2-only-positive-weights
images/002.png,avif,s7,q23,75.90065570,73.00392507
images/002.png,avif,s7,q25,73.88054718,70.26941217
images/002.png,avif,s7,q27,71.72160467,67.03692228
images/002.png,avif,s7,q29,69.10752095,64.58631157
images/002.png,avif,s7,q31,67.60556344,60.51143569
images/002.png,avif,s7,q33,65.76856998,56.44158147
images/002.png,avif,s7,q35,62.94440338,52.14308948
images/002.png,avif,s7,q37,61.66782141,46.38953710
images/002.png,avif,s7,q39,61.37867846,40.24145433
images/002.png,avif,s7,q41,64.37997448,32.39549149
images/002.png,avif,s7,q43,63.22384863,26.74709307
images/002.png,avif,s7,q45,65.74042586,20.77856709
```
|
|
|
Jyrki Alakuijala
|
2022-09-26 10:49:58
|
If you can have ok results without negative weights, yes, of course that is better then
|
|
|
_wb_
|
2022-09-26 10:51:09
|
looking at the validation set numbers, the gap is small and not worth the unreliability
|
|
2022-09-26 10:52:52
|
I only discovered this because I was manually inspecting image pairs where different metrics disagree strongly โ same as how I found out about vmaf being oblivious to banding
|
|
2022-09-26 10:57:24
|
I can give you some examples of where the new ssimulacra2 and butteraugli 3-norm disagree, if that helps to improve butteraugli
|
|
2022-09-26 10:58:24
|
orig: https://jon-cld.s3.amazonaws.com/test_images/reference/032.png
A: https://jon-cld.s3.amazonaws.com/test_images/032/mozjpeg-revert-q26.jpg
B: https://jon-cld.s3.amazonaws.com/test_images/032/avif-s6-q43.png
Butteraugli 3-norm says A>B
|
|
2022-09-26 11:00:23
|
orig: https://jon-cld.s3.amazonaws.com/test_images/reference/011.png
A: https://jon-cld.s3.amazonaws.com/test_images/011/mozjpeg-revert-q26.jpg
B: https://jon-cld.s3.amazonaws.com/test_images/011/jxl-e6-q16.png
|
|
2022-09-26 11:01:42
|
orig: https://jon-cld.s3.amazonaws.com/test_images/reference/030.png
A: https://jon-cld.s3.amazonaws.com/test_images/030/mozjpeg-revert-q30.jpg
B: https://jon-cld.s3.amazonaws.com/test_images/030/aurora-faster-psycho-visual=0-q45.png
|
|
2022-09-26 11:03:04
|
it's typically cases where A has worse banding but B looks otherwise worse, so to some extent this is a matter of taste, I suppose.
|
|
2022-09-26 11:05:38
|
Ah, here is a different case.
orig: https://jon-cld.s3.amazonaws.com/test_images/reference/022.png
A: https://jon-cld.s3.amazonaws.com/test_images/022/avif-s7-q37.png
B: https://jon-cld.s3.amazonaws.com/test_images/022/mozjpeg-q40.jpg
|
|
2022-09-26 11:06:28
|
btw chrome renders that untagged jpeg a bit differently (brighter) on my laptop than how it renders the pngs
|
|
2022-09-26 11:07:05
|
in safari I see no difference in color between the jpeg and the png, but in chrome the jpeg looks brighter
|
|
2022-09-26 11:08:25
|
<@604964375924834314> do you know what causes this? Is chrome perhaps interpreting untagged jpeg as rec709 instead of sRGB or something?
|
|
2022-09-26 11:10:18
|
Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/003.png
A: https://jon-cld.s3.amazonaws.com/test_images/003/mozjpeg-revert-q24.jpg
B: https://jon-cld.s3.amazonaws.com/test_images/003/jxl-e6-q16.png
|
|
2022-09-26 11:10:50
|
these are all cases where Butteraugli 3-norm says A is better than B while ssimulacra2-nonegativeweights says B is better than A.
|
|
2022-09-26 11:16:17
|
Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/001.png
A: https://jon-cld.s3.amazonaws.com/test_images/001/mozjpeg-revert-q28.jpg
B: https://jon-cld.s3.amazonaws.com/test_images/001/jxl-e6-q34.png
|
|
2022-09-26 11:21:42
|
Orig: https://jon-cld.s3.amazonaws.com/test_images/reference/002.png
A: https://jon-cld.s3.amazonaws.com/test_images/002/avif-s7-q41.png
B: https://jon-cld.s3.amazonaws.com/test_images/002/aurora-faster-psycho-visual=0-q43.png
|
|
2022-09-26 11:24:07
|
There are also a few cases where I agree with Butteraugli 3-norm, but most of the time (including in the above ones) I tend to agree with ssimulacra2
|
|
|
Jyrki Alakuijala
|
2022-09-26 11:38:59
|
it can be insightful to compare such things with a distance map
|
|
2022-09-26 11:40:50
|
also, I use pairs like this to calibrate butteraugli when 'far away' from the JND
|
|
2022-09-26 11:41:25
|
then I can just look at the image with the 'a's and see if the shapes of observable isodifference contours are close to butteraugli heat maps
|
|
2022-09-26 11:42:31
|
in different colors (and gray of course, too)
|
|
2022-09-26 11:43:24
|
green
|
|
2022-09-26 11:44:16
|
the idea was to roughly approximate the xy, yb and xb high-frequency planes (the three squares)
|
|
2022-09-26 11:44:36
|
and see how those planes match with what the model is seeing
|
|
|
spider-mario
|
|
_wb_
<@604964375924834314> do you know what causes this? Is chrome perhaps interpreting untagged jpeg as rec709 instead of sRGB or something?
|
|
2022-09-26 11:45:35
|
itโs the PNGs that are โwrongโ, they have a gAMA chunk of 0.45455 but no sRGB chunk
|
|
2022-09-26 11:45:55
|
ImageMagick does that (https://github.com/ImageMagick/ImageMagick/issues/4375 )
|
|
2022-09-26 11:46:33
|
oh, right, they closed it as โcompletedโ but I am not sure that they actually fixed it, I should look into it and maybe reopen
|
|
|
_wb_
|
2022-09-26 11:48:35
|
the thing is, in safari the png and the jpeg look the same, and they both look darker just like the png in chrome
|
|
|
spider-mario
|
2022-09-26 11:48:49
|
safari probably ignores the gAMA chunk
|
|
2022-09-26 11:49:05
|
oh, โjust likeโ, I misread
|
|
|
_wb_
|
2022-09-26 11:49:20
|
yes, it's only the jpeg that looks different
|
|
2022-09-26 11:53:03
|
so is safari displaying the jpeg incorrectly then?
|
|
|
spider-mario
|
2022-09-26 12:01:39
|
possibly, could it be that itโs approximating the sRGB curve with a 2.2 gamma?
|
|
|
_wb_
|
2022-09-30 07:44:15
|
https://sneyers.info/tradeoffs/
|
|
2022-10-01 02:23:46
|
Added some scatter plots called `disagreement_*`
|
|
2022-10-01 02:23:50
|
e.g. https://sneyers.info/tradeoffs/disagreement_PSNR-Y_vs_SSIMULACRA_2_noneg.html
|
|
2022-10-01 02:24:28
|
every point here is a pair of images A,B where one metric says A>>B and the other says A<<B
|
|
2022-10-01 02:24:51
|
if you click on it it will open A,orig,B in three tabs (you have to disable the popup blocker on chrome for this)
|
|
2022-10-01 02:28:21
|
I had to limit it to show only the 1000 biggest disagreements per original, otherwise the thing became too heavy
|
|
2022-10-01 02:29:19
|
Looking at the disagreements is quite fun and it's nice to get examples of why e.g. PSNR is a crap metric
|
|
|
Traneptora
|
2022-10-03 01:43:13
|
do we have any benchmarks for lossless JXL as an intra-video codec
|
|
2022-10-03 01:43:38
|
since you gain minimal from lossless inter-prediction, I'm wondering how it could perform compared to something like, say, ffv1
|
|
|
|
veluca
|
2022-10-03 12:08:21
|
I don't think we do, but AFAIU the jxl context model is pretty much an extension of the ffv1 one so I'd be surprised if it's not better
|
|
|
_wb_
|
2022-10-03 05:13:26
|
Well ffv1 has some advantages:
- adaptive chances (cabac)
- ctx model is persistent across frames
- no groups, so slight advantage for prediction (no poorly predicted first row and column per group)
|
|
2022-10-03 05:13:50
|
But jxl has a lot more advantages, not gonna list them
|
|
2022-10-04 10:21:32
|
https://github.com/Netflix/vmaf/issues/1102
|
|
|
BlueSwordM
|
2022-10-04 02:54:20
|
<@794205442175402004> Did you also test with VMAF_neg?
|
|
|
_wb_
|
2022-10-04 02:54:39
|
what's the command line for that?
|
|
|
BlueSwordM
|
|
_wb_
what's the command line for that?
|
|
2022-10-04 02:55:56
|
VMAF_neg is a different model, so just specifying a different model path to that model will work.
|
|
|
_wb_
|
2022-10-04 02:56:05
|
is that a different --model? built-in?
|
|
|
BlueSwordM
|
|
_wb_
is that a different --model? built-in?
|
|
2022-10-04 02:56:53
|
Are you using ffmpeg or the vmaf exec?
|
|
|
_wb_
|
2022-10-04 02:56:58
|
vmaf exec
|
|
|
BlueSwordM
|
|
_wb_
vmaf exec
|
|
2022-10-04 02:58:20
|
You can find all the cmd arguments here:
https://github.com/Netflix/vmaf/tree/master/libvmaf/tools
|
|
2022-10-04 02:58:49
|
```--model/-m $params: model parameters, colon ":" delimited `path=` path to model file `version=` built-in model version```
|
|
2022-10-04 02:59:59
|
vmaf neg will usually penalize such processing operations like gamma curve adjustments, contrast enhancement and sharpening.
|
|
|
_wb_
|
2022-10-04 03:01:43
|
so if I use `--model version=vmaf_v0.6.1neg` I do get different numbers
|
|
2022-10-04 03:03:31
|
orig vs orig: 97.424967
orig vs darker: 95.920890
orig vs jpg q60: 91.667490
orig vs darker jpg q60: 91.985217
orig vs jpg q75: 93.364518
orig vs darker jpg q75: 93.587509
|
|
2022-10-04 03:05:46
|
so it does appear to be more robust but still gives somewhat weird bonus points to images that are a bit too dark
|
|
2022-10-04 03:09:36
|
https://netflixtechblog.com/toward-a-better-quality-metric-for-the-video-community-7ed94e752a30 oh now I understand โ default VMAF is attempting to be a metric that captures 'enhancement'?
|
|
2022-10-04 03:09:44
|
that explains a lot
|
|
2022-10-04 03:11:05
|
so it's not a fidelity metric but really an appeal metric where you can be better than the original by doing some sharpening, color adjustment and denoising
|
|
|
BlueSwordM
|
|
_wb_
so it does appear to be more robust but still gives somewhat weird bonus points to images that are a bit too dark
|
|
2022-10-04 03:56:54
|
I see.
|
|
2022-10-04 04:04:56
|
It does penalize the changes, but gamma curve changes should be taken into account imo.
|
|
|
_wb_
|
2022-10-10 03:29:38
|
Original: https://jon-cld.s3.amazonaws.com/test/images/006.png
Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png
Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg
Almost all the metrics say B is better than A:
VMAF-NEG: A=87.757061, B=90.774064
Butteraugli 3-norm: A=1.819012, B=1.726316
SSIMULACRA: A=0.06641718, B=0.06335570
DSSIM: A=0.00634723, B=0.00520672
PSNR-Y: A=31.551152, B=31.572802
PSNR-HVS: A=37.272210, B=39.058198
SSIM: A=0.996295, B=0.996939
MS-SSIM: A=0.98866, B=0.991059
Only one metric says the opposite:
SSIMULACRA 2: A=62.92532865, B=56.85243698
What is your opinion? Which image do you prefer?
|
|
|
fab
|
2022-10-10 03:36:28
|
C
|
|
2022-10-10 03:36:31
|
C
|
|
|
_wb_
|
2022-10-10 03:37:35
|
obviously both are kind of low quality and of course everyone would prefer a higher quality image than this, but say you have to pick one
|
|
|
fab
|
2022-10-10 03:38:00
|
B with better flowers
|
|
2022-10-10 03:38:03
|
At right
|
|
2022-10-10 03:38:10
|
The one with yellow
|
|
2022-10-10 03:39:18
|
A seems less degraded
|
|
2022-10-10 03:39:23
|
Look at the roof
|
|
2022-10-10 03:39:50
|
Probably a
|
|
2022-10-10 03:42:17
|
On psnr y i agree with a
|
|
2022-10-10 03:43:42
|
B
|
|
2022-10-10 03:44:20
|
The grid at left looks better with b
|
|
2022-10-10 03:45:10
|
C Is higher quality
|
|
2022-10-10 03:45:23
|
I see c as higher quality
|
|
2022-10-10 03:47:31
|
|
|
2022-10-10 03:47:52
|
This Colour put cmyk at 4,0,12,43
|
|
2022-10-10 03:48:15
|
C hurts my eyes (for saturation)
|
|
|
BlueSwordM
|
|
_wb_
Original: https://jon-cld.s3.amazonaws.com/test/images/006.png
Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png
Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg
Almost all the metrics say B is better than A:
VMAF-NEG: A=87.757061, B=90.774064
Butteraugli 3-norm: A=1.819012, B=1.726316
SSIMULACRA: A=0.06641718, B=0.06335570
DSSIM: A=0.00634723, B=0.00520672
PSNR-Y: A=31.551152, B=31.572802
PSNR-HVS: A=37.272210, B=39.058198
SSIM: A=0.996295, B=0.996939
MS-SSIM: A=0.98866, B=0.991059
Only one metric says the opposite:
SSIMULACRA 2: A=62.92532865, B=56.85243698
What is your opinion? Which image do you prefer?
|
|
2022-10-10 03:48:45
|
B definitely, but I can see why ssimu2 prefers A.
The image is sharper on the right, but there are a lot more artifacts, including ringing around edges and banding in the sky.
|
|
|
fab
|
2022-10-10 03:49:17
|
0.159 0,143 epf 3
|
|
|
fab
C hurts my eyes (for saturation)
|
|
2022-10-10 03:50:56
|
Of the Green
|
|
2022-10-10 03:52:01
|
I actually prefer the branches of trees at A
|
|
2022-10-10 03:52:31
|
For psnrhvs i agree on A
|
|
2022-10-10 03:58:12
|
Ok
|
|
2022-10-10 03:58:27
|
17:58:09
|
|
2022-10-10 03:58:35
|
17:58:26
|
|
2022-10-10 03:58:40
|
16:58:36
|
|
|
_wb_
|
|
BlueSwordM
B definitely, but I can see why ssimu2 prefers A.
The image is sharper on the right, but there are a lot more artifacts, including ringing around edges and banding in the sky.
|
|
2022-10-10 03:59:46
|
I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.
|
|
|
fab
|
|
_wb_
I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.
|
|
2022-10-10 04:00:53
|
To me i don't know why only Stefania scordio image look sharp on jxl
|
|
2022-10-10 04:01:10
|
If i do a normal Image of an animal
|
|
2022-10-10 04:01:16
|
Sometimes it misses nose
|
|
2022-10-10 04:01:32
|
I don't know if is normal functionamrnt
|
|
2022-10-10 04:01:50
|
Since october 2021 has been in this way
|
|
2022-10-10 04:02:15
|
Is not like it aims to do every cartoon perfectly
|
|
2022-10-10 04:02:20
|
Every Animals
|
|
2022-10-10 04:03:54
|
To me ssimulacra b is better
|
|
2022-10-10 04:07:36
|
Ssim c Is bad
|
|
|
BlueSwordM
|
|
_wb_
I agree that B is sharper, but to me B is unusable because of the glaring artifacts (mostly the banding in the sky) while A is low quality but perhaps just usable for the lower end of a 'web quality' image.
|
|
2022-10-10 04:08:52
|
I somewhat disagree. From 1.5H monitor distance, outside of the banding, the B image looks better.
|
|
|
fab
|
2022-10-10 04:09:20
|
To me the psnr y of b is too high <@321486891079696385> do you agree
|
|
2022-10-10 04:10:14
|
Butteraugli should be 1.8175
|
|
2022-10-10 04:11:26
|
B psnr y 3483514
|
|
2022-10-10 04:12:00
|
Based on that data calibrate
|
|
2022-10-10 04:12:10
|
I don't know I'm not engineer
|
|
|
_wb_
|
2022-10-10 04:12:13
|
https://jon-cld.s3.amazonaws.com/test/distorted/008/mozjpeg-2x2-revert-q20.jpg VMAF=81.398802
https://jon-cld.s3.amazonaws.com/test/distorted/008/jxl-adc0-e6-q30.png VMAF=79.429024
|
|
|
fab
|
2022-10-10 04:13:19
|
Idw
|
|
|
_wb_
https://jon-cld.s3.amazonaws.com/test/distorted/008/mozjpeg-2x2-revert-q20.jpg VMAF=81.398802
https://jon-cld.s3.amazonaws.com/test/distorted/008/jxl-adc0-e6-q30.png VMAF=79.429024
|
|
2022-10-10 04:14:03
|
The b is the result of all my points
|
|
2022-10-10 04:14:25
|
B psnr y 3483514
|
|
|
_wb_
|
2022-10-10 04:14:42
|
The thing with banding is that it's a very persistent artifact: even when looking from far away or in these downscaled discord previews you can still see it
|
|
|
fab
|
2022-10-10 04:35:28
|
A at center is too blurred wavy
|
|
2022-10-10 04:35:38
|
Out of focus
|
|
2022-10-10 04:52:20
|
|
|
2022-10-10 04:52:45
|
Are those valuable obtanaible with this image
|
|
2022-10-10 04:53:05
|
What is the strangest parameter
|
|
|
_wb_
Original: https://jon-cld.s3.amazonaws.com/test/images/006.png
Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png
Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg
Almost all the metrics say B is better than A:
VMAF-NEG: A=87.757061, B=90.774064
Butteraugli 3-norm: A=1.819012, B=1.726316
SSIMULACRA: A=0.06641718, B=0.06335570
DSSIM: A=0.00634723, B=0.00520672
PSNR-Y: A=31.551152, B=31.572802
PSNR-HVS: A=37.272210, B=39.058198
SSIM: A=0.996295, B=0.996939
MS-SSIM: A=0.98866, B=0.991059
Only one metric says the opposite:
SSIMULACRA 2: A=62.92532865, B=56.85243698
What is your opinion? Which image do you prefer?
|
|
2022-10-10 04:53:14
|
This image
|
|
2022-10-10 04:53:27
|
From the original source input to jxl
|
|
2022-10-10 04:55:41
|
Id say vmaf 85.461
|
|
2022-10-10 04:56:01
|
And mssim 0.94256
|
|
2022-10-10 04:56:18
|
Psnr 0.30246
|
|
2022-10-10 04:56:35
|
Butteraugli 0.1602513
|
|
2022-10-10 04:57:54
|
Ssimulacra2 0.9045065486
|
|
2022-10-10 04:59:02
|
Ssim 0.08816430877
|
|
2022-10-10 04:59:48
|
Psnr hvs 0.3573946
|
|
|
_wb_
|
2022-10-10 05:01:04
|
Fab you are making very little sense, maybe you can start a forum thread about this instead of spamming this channel?
|
|
|
fab
|
2022-10-10 05:02:17
|
You right
|
|
2022-10-10 05:02:37
|
Remember eyes focus on green
|
|
2022-10-10 05:04:48
|
To me id like 31.5019486 on that Image of a notebook foe psnr
|
|
2022-10-10 05:04:56
|
Maybe psnr is a bit strong or underlooked
|
|
2022-10-10 05:07:45
|
ID like also further improvement from this
|
|
2022-10-10 05:09:04
|
To me this is too deringed enhancer
|
|
2022-10-10 05:09:12
|
Boh
|
|
2022-10-10 05:09:28
|
Sorry for sticker jon
|
|
2022-10-10 05:10:29
|
Ssimulacra of b Is optimum value to aim
|
|
2022-10-10 05:12:47
|
Then i would like distance 2.3
|
|
2022-10-10 05:13:08
|
Then you calibrate for every distance
|
|
2022-10-10 05:13:16
|
And post image at d1
|
|
2022-10-10 05:30:01
|
I'd like to see a 18.9% better mssim than b
|
|
2022-10-10 05:30:51
|
That's the quality when I want to see the Image
|
|
2022-10-10 05:34:36
|
Ssim not less than 0.881500877 at d 2.3
|
|
2022-10-10 05:35:14
|
Like 0.0014 less
|
|
2022-10-10 05:35:22
|
That max i accept
|
|
2022-10-10 05:36:10
|
B i want 91.77 vmaf
|
|
2022-10-10 05:42:41
|
Id say to make 1,3 1,8 less smooth
|
|
2022-10-10 05:42:50
|
Two Channel
|
|
|
improver
|
2022-10-10 06:19:57
|
fab have u ever tried nicotine (legit no ill intended question)
|
|
|
fab
|
|
_wb_
Original: https://jon-cld.s3.amazonaws.com/test/images/006.png
Image A: https://jon-cld.s3.amazonaws.com/test/distorted/006/jxl-adc0-e6-q60.png
Image B: https://jon-cld.s3.amazonaws.com/test/distorted/006/mozjpeg-2x2-revert-q44.jpg
Almost all the metrics say B is better than A:
VMAF-NEG: A=87.757061, B=90.774064
Butteraugli 3-norm: A=1.819012, B=1.726316
SSIMULACRA: A=0.06641718, B=0.06335570
DSSIM: A=0.00634723, B=0.00520672
PSNR-Y: A=31.551152, B=31.572802
PSNR-HVS: A=37.272210, B=39.058198
SSIM: A=0.996295, B=0.996939
MS-SSIM: A=0.98866, B=0.991059
Only one metric says the opposite:
SSIMULACRA 2: A=62.92532865, B=56.85243698
What is your opinion? Which image do you prefer?
|
|
2022-10-10 06:52:13
|
Are available new encodings?
|
|
|
Traneptora
|
|
improver
fab have u ever tried nicotine (legit no ill intended question)
|
|
2022-10-10 07:06:24
|
????
|
|
|
Nova Aurora
|
2022-10-10 07:08:27
|
Good old fabian
|
|
|
_wb_
|
2022-10-10 07:13:12
|
Here is what various metrics think about the filesize savings compared to plain libjpeg; encoder settings are aligned by 10th percentile worst-case and the range is libjpeg q20 to q98. Yellow line is libaom s6 tune=ssim 4:4:4, red line is libjxl 0.7, blue line is current git libjxl.
Butteraugli 3-norm:
|
|
2022-10-10 07:14:01
|
DSSIM:
|
|
2022-10-10 07:14:51
|
CIEDE2000:
|
|
2022-10-10 07:16:40
|
MS-SSIM: (very strange, avif becomes worse than jpeg at libjpeg q>72)
|
|
2022-10-10 07:17:23
|
PSNR-HVS:
|
|
2022-10-10 07:18:07
|
PSNR-Y:
|
|
2022-10-10 07:19:29
|
SSIM:
|
|
2022-10-10 07:21:09
|
VMAF-NEG: (says both avif and jxl are worse than libjpeg-turbo when q>36)
|
|
2022-10-10 07:21:48
|
SSIMULACRA 1:
|
|
2022-10-10 07:22:24
|
SSIMULACRA 2:
|
|
2022-10-10 07:23:44
|
As you can see, how much you can save by using jxl or avif instead of jpeg depends a lot on which metric you ask.
|
|
2022-10-10 07:26:51
|
For some reason, VMAF(-NEG) really likes libjpeg-turbo. It claims that to reach a quality equivalent to libjpeg q80, you need to use libjxl q94 (which is 33% larger) or avif q15 (which is 16% larger).
|
|
2022-10-10 07:30:22
|
The only metrics (from the ones shown above) that make some amount of sense to me are <@532010383041363969>'s Butteraugli 3-norm, <@826537092669767691>'s DSSIM, my own SSIMULACRA (2 more than 1), and perhaps PSNR-Y (as a perceptually very crappy but at least not completely nonsensical metric). The others look just nuts to me.
|
|
2022-10-10 07:36:05
|
Whether my recent encode quality tweaks were actually an improvement or not depends on which metric you ask. For Butteraugli it's about the same at the high end and a 2% regression or so at the low end. For DSSIM it seems to be a 3-5% improvement across the spectrum. For SSIMULACRA 2 it's a 1-2% improvement at the high end, ~5% improvement at the low end.
|
|
2022-10-11 08:44:20
|
https://jon-cld.s3.amazonaws.com/test/disagreement_VMAF_vs_SSIMULACRA_2.html
|
|
|
Jyrki Alakuijala
|
|
_wb_
SSIMULACRA 2:
|
|
2022-10-11 11:12:29
|
The fact that it is worse on SSIMULACRA 1 and much better on SSIMULACRA 2 indicates how important it is to keep improving metrics
|
|
|
_wb_
|
2022-10-11 11:18:20
|
Yes, there is still work to be done. Looking at metric disagreements can be useful to find failure cases. I just discovered a strange case where ssimulacra2 is still behaving non-monotonically. It's on this image: https://jon-cld.s3.amazonaws.com/test/images/1028637.png
|
|
2022-10-11 11:20:37
|
the curves for jxl and avif look normal, but the ones for libjpeg and mozjpeg show weird and strong oscillations, which is unexpected and might indicate a bug
|
|
2022-10-11 11:20:51
|
(a bug in ssimulacra2 that is)
|
|
2022-10-11 11:22:16
|
other metrics also get some oscillation on this image, but it's not as severe
|
|
2022-10-11 11:22:34
|
|
|
|
Jyrki Alakuijala
|
2022-10-11 12:06:50
|
non-monotonic can be understandable -- consider that quantization is more and less lucky at times
|
|
|
_wb_
|
2022-10-11 12:07:00
|
yeah but not to that extent
|
|
2022-10-11 12:07:10
|
I found the root cause
|
|
|
Jyrki Alakuijala
|
2022-10-11 12:07:12
|
agreed, that needs more work
|
|
|
_wb_
|
2022-10-11 12:07:22
|
it's a numerical issue
|
|
|
Jyrki Alakuijala
|
2022-10-11 12:07:33
|
-100 is a poor score ๐
|
|
|
_wb_
|
2022-10-11 12:07:52
|
caused by the image having lots of black, which caused situations of dividing two near-zero numbers
|
|
2022-10-11 12:08:27
|
a silly numerical stability problem in the computation of the ssim map
|
|
|
Jyrki Alakuijala
|
2022-10-11 12:08:51
|
if you do xyb mapping properly, then you should end up in uniform perceptual space where no such divisions need to be made
|
|
2022-10-11 12:09:05
|
it should allow for superposition after that
|
|
2022-10-11 12:09:30
|
(at least the biased log compression in butteraugli's gamma)
|
|
|
_wb_
|
2022-10-11 12:11:03
|
|
|
2022-10-11 12:11:20
|
it's in that division. I picked too small values for c1 and/or c2
|
|
|
fab
|
2022-10-11 12:11:40
|
I think the image make it looks like the original is sharp
|
|
2022-10-11 12:11:47
|
Is unnatural
|
|
2022-10-11 12:11:58
|
With countrside images
|
|
2022-10-11 12:12:43
|
I used the 08102022 commit
|
|
2022-10-11 12:12:51
|
It's quality improvements
|
|
2022-10-11 12:13:09
|
But not naturalness
|
|
|
_wb_
|
|
_wb_
it's in that division. I picked too small values for c1 and/or c2
|
|
2022-10-11 12:43:41
|
actually it's not really that. The real issue is that FastGaussian is a bit inaccurate, causing an image with only positive values to get slightly negative values after blurring. And that is obviously a problem when you then plug them in in the SSIM formula.
|
|
|
improver
|
2022-10-11 01:00:37
|
needs SlowAccurateGaussian
|
|
|
Jyrki Alakuijala
|
2022-10-11 01:01:33
|
I don't like those c1 and c2, they are non-sense like all of ssim, brain is not computing these things
|
|
2022-10-11 01:01:57
|
the division is wrong imho
|
|
|
_wb_
|
2022-10-11 01:16:48
|
that SSIM formula is made so 1 means perfect (orig = distorted), but for black pixels the mu_x and mu_y are zero and for perfectly flat regions the sigmas are zero, so the c1 and c2 are just to force the numerator and denominator to be nonzero. it's not elegant but it kind of works
|
|
2022-10-11 01:17:31
|
it does crucially require all pixel values to be >= 0 though, otherwise you get complete nonsense
|
|
2022-10-11 01:24:53
|
Adding some epsilon to Y solves the problem โ then the FastGaussian inaccuracy doesn't cause values to become negative anymore
|
|
|
Jyrki Alakuijala
|
2022-10-11 01:42:05
|
what if you add a big epsilon like 1e6
|
|
2022-10-11 01:42:46
|
that of course flattens the dynamics -- you'd need to renormalize there
|
|
|
_wb_
|
2022-10-11 01:49:56
|
I'm adding 0.05, which is enough to counter the FastGaussian inaccuracy but it already changes scores a bit (like +3 or so). I'll retune the weights anyway because now with this bug out of the way maybe I can get a better fit
|
|
|
Jyrki Alakuijala
|
2022-10-11 01:52:34
|
try optimizing it for 0.05 and 1.0 (if 1.0 is the 80 nits or so)
|
|
|
Kornel
|
2022-10-11 03:14:28
|
When using various metrics tools, are you giving them all the same normalized PNG files? Can surprising results come from their own image decoders? Bad gamma or color profile handling.
|
|
|
_wb_
|
2022-10-11 03:16:31
|
yeah this did cause issues because I was using ImageMagick to produce pngs from ppm which gives them gAMA but no sRGB and all the colorspace confusion that causes
|
|
2022-10-11 03:17:35
|
so I'm rerunning everything, now with pngs that have have an explicit ICC profile that says sRGB and no gAMA so there is no way to get it wrong, I hope
|
|
2022-10-11 03:18:08
|
(i.e. tools that do properly handle colorspace will treat it as sRGB, and those that don't will hopefully also treat it as sRGB)
|
|
2022-10-11 03:21:02
|
the super confusing thing is that VMAF gives you bonus points if you get the colorspace wrong, so letting it compare a gAMA original with an sRGB decompressed image (both converted to y4m without any conversion besides the yuv matrix) results in higher scores than if you don't mess up the colorspaces
|
|
2022-10-11 03:21:51
|
see also: https://github.com/Netflix/vmaf/issues/1102
|
|
2022-10-11 03:22:06
|
anyway, that's apparently a feature, not a bug
|
|
2022-10-11 03:22:15
|
VMAF-NEG does not have this "feature", so I'm using that now
|
|
2022-10-11 03:22:31
|
but it's still a very bad metric
|
|
2022-10-11 03:25:53
|
it says mozjpeg -revert (so unoptimized libjpeg-turbo) is better than mozjpeg, avif and jxl at q>50
|
|
2022-10-11 03:33:37
|
And even at q<50, you can only save 10-20% over unoptimized libjpeg-turbo by using avif or jxl, according to vmaf-neg
|
|
2022-10-11 03:37:07
|
I don't know what went wrong there, but clearly something went wrong when they tuned vmaf. Perhaps they overfitted for the (video) encoders they used in their subjective testing?
|
|
|
BlueSwordM
|
|
_wb_
I don't know what went wrong there, but clearly something went wrong when they tuned vmaf. Perhaps they overfitted for the (video) encoders they used in their subjective testing?
|
|
2022-10-11 03:39:45
|
For the 1080p model: x264 CRF 22-24, 3H monitor viewing distance, appeal > fidelity, 4 metric results, ML, etc.
For the 4k model: X encoder, 1.5H monitor viewing distance, 4 metrics results, ML, etc.
|
|
|
_wb_
|
2022-10-11 03:46:41
|
appeal > fidelity is OK, but how can they say this image looks OK? https://jon-cld.s3.amazonaws.com/test/distorted/861443/mozjpeg-2x2-revert-q14.jpg
|
|
2022-10-11 03:47:14
|
that's VMAF-neg 80.1
|
|
2022-10-11 03:48:06
|
https://jon-cld.s3.amazonaws.com/test/distorted/861443/jxl-225b6884-pdc0-e6-q20.png
|
|
2022-10-11 03:48:16
|
this is VMAF-neg 79.1
|
|
|
BlueSwordM
|
|
_wb_
https://jon-cld.s3.amazonaws.com/test/distorted/861443/jxl-225b6884-pdc0-e6-q20.png
|
|
2022-10-11 03:55:40
|
VMAF is not very sensitive to banding, which is why they made CAMBI separately.
What does normal VMAF give you?
|
|
|
_wb_
|
|
BlueSwordM
VMAF is not very sensitive to banding, which is why they made CAMBI separately.
What does normal VMAF give you?
|
|
2022-10-11 04:05:49
|
Default VMAF gives the first blocky mess a score of 83.702501 and the second image 80.415111
|
|
|
BlueSwordM
|
|
_wb_
Default VMAF gives the first blocky mess a score of 83.702501 and the second image 80.415111
|
|
2022-10-11 04:06:12
|
That's very interesting. You should publish your results in the VMAF issue report.
|
|
|
_wb_
|
2022-10-11 04:06:48
|
CAMBI for the first image is 0.002237
|
|
2022-10-11 04:06:59
|
CAMBI for the second image is 8.714630
|
|
2022-10-11 04:07:28
|
CAMBI is more-is-worse so it claims the first has almost no banding and the second has a ton of banding
|
|
2022-10-11 04:08:43
|
these are not isolated cases, I don't know what they are doing but it gets things very wrong
|
|
|
improver
|
2022-10-11 04:43:16
|
"second one is just regular artifacts around moon. first one is a fully baked aesthetic, i dig it" t. CAMBI
|
|
2022-10-11 04:44:20
|
it's not that there aint bands on the second one, they're just spred out kinda
|
|
|
BlueSwordM
|
|
improver
it's not that there aint bands on the second one, they're just spred out kinda
|
|
2022-10-11 04:48:06
|
It might be that CAMBI tries to remove dithering to see the actual banding.
|
|
2022-10-11 06:46:46
|
<@794205442175402004> What's the lowest quality that Cloudinary uses in the worst case scenario possible for automatic image coding? I would like to have a lowest quality anchor to show the quality improvements at the lowest end for various standards and encoders.
|
|
|
_wb_
|
2022-10-11 06:47:41
|
Well people can specify q_1 if they want to, but of course nobody does that.
|
|
2022-10-11 06:48:04
|
For the automatic quality settings, we have q_auto:low at the lowest end
|
|
|
BlueSwordM
|
|
_wb_
For the automatic quality settings, we have q_auto:low at the lowest end
|
|
2022-10-11 06:48:47
|
That would be mozjpeg Q30, right?
|
|
|
_wb_
|
|
BlueSwordM
|
2022-10-11 06:48:52
|
Or 50, I'm not sure.
|
|
|
_wb_
|
2022-10-11 06:49:11
|
It's not a fixed q setting, it is image dependent
|
|
|
BlueSwordM
|
2022-10-11 06:49:31
|
Oh I see.
|
|
|
_wb_
|
2022-10-11 06:51:25
|
it corresponds to the average quality you get with mozjpeg q62, more or less
|
|
2022-10-11 06:52:31
|
q_auto:eco corresponds to the average quality you get with mozjpeg q72 or so
q_auto:good corresponds to the average quality you get with mozjpeg q80 or so
q_auto:best corresponds to the average quality you get with mozjpeg q90 or so
|
|
|
BlueSwordM
|
2022-10-11 06:52:39
|
Well, I guess mozjpeg q50-q55 as the lowest quality anchor was a decent choice on my end then.
Thanks wb.
|
|
|
_wb_
|
2022-10-11 06:53:48
|
I estimate that <10% of our images delivered use q_auto:low, something like 35-40% each for :eco and :good, and 10-15% for :best
|
|
2022-10-11 07:37:58
|
many of the disagreements between vmaf-neg and ssimulacra2 are where there's horrible banding in one image: https://jon-cld.s3.amazonaws.com/test/disagreement_VMAF_vs_SSIMULACRA_2.html
ssimulacra2 hates banding with a passion, vmaf-neg doesn't seem to care about it at all
|
|
2022-10-11 07:38:33
|
https://jon-cld.s3.amazonaws.com/test/distorted/239581/jxl-225b6884-pdc0-e6-q12.png
https://jon-cld.s3.amazonaws.com/test/distorted/239581/mozjpeg-2x2-revert-q30.jpg
|
|
2022-10-11 07:39:42
|
ssimulacra2 says both are crappy (it gives the first 49, the second 46)
|
|
2022-10-11 07:40:11
|
vmaf says the first is crappy (67) but the second is quite good (89)
|
|
|
fab
|
2022-10-11 07:41:14
|
I'd say the first is quite good
|
|
|
_wb_
|
2022-10-11 07:41:30
|
nah, the macarons are very blurry in the first image
|
|
|
fab
|
2022-10-11 07:41:51
|
I don't know what are maccheroni
|
|
|
_wb_
|
2022-10-11 07:42:05
|
in the second image they're a bit better but that banding in the background is very bad
|
|
|
fab
|
2022-10-11 07:42:09
|
Biscuits?
|
|
|
_wb_
|
2022-10-11 07:42:26
|
https://en.wikipedia.org/wiki/Macaron
|
|
|
BlueSwordM
|
2022-10-11 07:46:23
|
<@794205442175402004> Something seems to be odd about this regarding Clang 15 vs Clang 14 performance in JXL that I just found from Phoronix:
https://www.phoronix.com/news/LLVM-Clang-15-Benchmarks
Somehow, cjxl e7 is over 25% faster with Clang 15 over Clang 14.
|
|
|
_wb_
|
2022-10-11 07:47:46
|
that's a surprisingly / suspiciously large difference
|
|
2022-10-11 07:48:44
|
unless clang devs have been specifically trying to do better on that particular benchmark, I find it a bit hard to believe
|
|
|
BlueSwordM
|
2022-10-11 07:50:28
|
Yeah. That's something rather strange, which is why I wanted to relay it here.
I'll check on my end if I get similar numbers, because if I do, then I believe a hand written optimization could easily be written to take advantage of this speedup instead of leaving it to the compiler.
|
|
|
_wb_
|
2022-10-11 07:51:38
|
yes, please check if you can reproduce this
|
|
|
BlueSwordM
|
2022-10-11 08:04:05
|
Man, compiling compilers is quite the task for a CPU ๐
|
|
|
Traneptora
|
|
_wb_
vmaf says the first is crappy (67) but the second is quite good (89)
|
|
2022-10-11 08:55:01
|
the 2nd one preserves much more detail in the macarons
|
|
2022-10-11 08:55:40
|
the blurriness is undesirable
|
|
2022-10-11 08:56:36
|
I wonder if it's possible to tune the jxl encoder so it prioritizes details over smoothness, which appears to be what mozjpeg does
and jxl prioritizes continuity over details
|
|
2022-10-11 08:56:44
|
like if you disabled EPF I wonder how that would affect the scores
|
|
|
_wb_
|
2022-10-11 09:05:04
|
note that that jxl is way lower quality and less bpp, you wouldn't use such a setting in practice
|
|
|
Traneptora
|
2022-10-11 09:15:36
|
oh wow that's q12
|
|
2022-10-11 09:15:39
|
that's ridiculously bad
|
|
2022-10-11 09:16:23
|
speaking of `-q` I thought we prefered to avoid that
|
|
2022-10-11 09:16:29
|
and use `-d` instead
|
|
|
_wb_
|
2022-10-11 09:18:23
|
I don't mind using either, as long as it's clear that it's not some kind of absolute notion of "quality percentage" that is the same in all encoders. Using -d helps to reduce that confusion, so it's pedagogically useful.
|
|
|
Jyrki Alakuijala
try optimizing it for 0.05 and 1.0 (if 1.0 is the 80 nits or so)
|
|
2022-10-12 07:59:41
|
Why a value as large as 1.0? That brings the range of Y to something like [1,2] instead of [0,1]. It will probably still work with proper renormalization/retuning, but it's much more than what's needed to solve the numerical inaccuracy issue.
|
|
2022-10-12 08:00:45
|
I used 0.05 for now, tuning hasn't fully converged yet but I'm currently here:
```
all: MAE: 5.144167334668228 Kendall: 0.708734 Pearson: 0.874347
validation: MAE: 4.760461265675695 K: 0.719678 P: 0.889105
```
|
|
2022-10-12 08:02:50
|
the numbers for the current (no-negative-weights) ssimulacra2 are something like this:
all: MAE:5.27, Kendall 0.69722, Pearson: 0.86514
validation: MAE: 4.85, Kendall: 0.71108, Pearson: 0.88503
|
|
2022-10-12 08:04:56
|
and the numbers for the first iteration of ssimulacra2 (with negative weights, which caused problems on images outside the tuning set) were like this:
all: MAE: 4.274, Kendall: 0.74185, Pearson: 0.90244
validation: MAE: 4.470, Kendall: 0.72163, Pearson: 0.89504
|
|
2022-10-12 08:06:54
|
(you can see that the first iteration did some overfitting, it has better results for training than for validation while I'm quite sure that by random chance, the validation set is slightly 'easier' than the training set for SSIM-based metrics โ the existing ones all get slightly better scores on the validation set than on the training set)
|
|
2022-10-12 08:09:33
|
(also the numbers for the first iteration cannot be exactly compared to those of the second/current iterations since they are comparing to a different version of the subjective data โ we did a bit more data cleanup since then, which caused the overall range of scores to expand slightly, making it of course harder to get a low MAE)
|
|
|
fab
|
2022-10-12 08:11:18
|
Is similar to the numbers i mentioned
|
|
2022-10-12 08:11:27
|
So i think is good
|
|
2022-10-12 08:11:43
|
I forgot ssimulacra normal
|
|
2022-10-12 08:12:16
|
What is mae
|
|
2022-10-12 08:14:36
|
It improves sharpness of t shirt
|
|
|
_wb_
|
2022-10-12 08:15:21
|
mae is mean absolute error
|
|
2022-10-12 08:16:01
|
in this case between the metric score and the subjective mean opinion score
|
|
2022-10-12 08:17:07
|
so if the mean opinion score for an image is 80 (on a scale from 0 to 100), then ssimulacra2 gets within about +-5 of that on average
|
|
2022-10-12 08:18:12
|
which is not bad, the confidence intervals on those MOS scores are something like +- 3 anyway, so even a perfect metric would still have a MAE of 3 or so
|
|
2022-10-12 08:20:31
|
<@826537092669767691> I'm looking at cases where metrics disagree (obviously on most cases metrics will agree), e.g. here are disagreements between DSSIM and (the current version of, with a known bug that I'm fixing atm) SSIMULACRA 2: https://jon-cld.s3.amazonaws.com/test/disagreement_DSSIM_vs_SSIMULACRA_2.html
|
|
2022-10-12 08:21:21
|
this is very useful to find bugs or problematic edge cases
|
|
2022-10-12 09:01:49
|
Better perceptual metrics are crucial imo to improve the state of the art in lossy image compression. Otherwise we just end up making encoders that optimize for the wrong thing. I think that's the biggest thing that's wrong with AVIF: both in the bitstream design and now in the encoder optimization, they're looking at PSNR, SSIM and VMAF, and those are all bad metrics by themselves but even more so when you start optimizing for them โ you can fool the metric and produce nice plots, but that doesn't mean the images are actually any good.
|
|
|
Jyrki Alakuijala
|
2022-10-12 09:06:54
|
VMAF got an Emmy recently, I think 2021
|
|
2022-10-12 09:42:49
|
it will not be easy to convince people -- who built VMAF as the most celebrated contribution of their whole careers -- to not to base their decisions on it
|
|
|
Traneptora
and use `-d` instead
|
|
2022-10-12 09:48:22
|
I love -d instead of -q, and having -d 1.0 as default. People are more likely to go to lower quality without understanding what they are doing if it is not multiples of JND (just noticeable differences)
|
|
|
Traneptora
I wonder if it's possible to tune the jxl encoder so it prioritizes details over smoothness, which appears to be what mozjpeg does
and jxl prioritizes continuity over details
|
|
2022-10-12 09:50:35
|
AC vs. DC quantization balance can be adjusted -- new flatter quantization matrices can be computed -- or just (adaptively) sharpening the image slightly before compressing
|
|
|
fab
I don't know what are maccheroni
|
|
2022-10-12 09:53:44
|
in Zurich we call them luxemburgerli https://de.wikipedia.org/wiki/Luxemburgerli, a candidate name for the brotli-zopfli-guetzli etc. series
|
|
|
fab
|
|
Jyrki Alakuijala
AC vs. DC quantization balance can be adjusted -- new flatter quantization matrices can be computed -- or just (adaptively) sharpening the image slightly before compressing
|
|
2022-10-12 09:55:20
|
I say finally with new update Stefania scordio images look free from artifact as october 2021 from 321 kb to 109
|
|
2022-10-12 09:55:31
|
But they look boring mentally
|
|
2022-10-12 09:55:45
|
Like visually good
|
|
2022-10-12 09:55:57
|
Good greens maybe
|
|
2022-10-12 09:56:17
|
Bpp is good 0.581bpp
|
|
2022-10-12 09:56:44
|
Super boring it makes the day look longer
|
|
2022-10-12 09:56:47
|
For real
|
|
2022-10-12 09:57:07
|
They don't look like Memories
|
|
2022-10-12 09:57:40
|
So there is need to progress
|
|
2022-10-12 09:58:14
|
I said some input
|
|