|
fab
|
2022-10-12 09:58:35
|
And jon already computed a bpp in an first iteration
|
|
2022-10-12 10:00:52
|
You need to add things
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:01:16
|
can you show an example of the improvement
|
|
|
fab
|
2022-10-12 10:03:12
|
Pc is open
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:07:18
|
Kornel and Jon: what is your thinking about why to divide by pixel sample means (+C1) in SSIM ?
|
|
2022-10-12 10:08:36
|
in https://en.wikipedia.org/wiki/Structural_similarity
|
|
2022-10-12 10:09:13
|
why variance differences should be related to intensity differences if we are already in gamma compressed or otherwise linear psychovisual experience space
|
|
2022-10-12 10:10:39
|
I cannot think of a neurophysiological process that would do anything like that in the retina, I consider it one big mistake in SSIM
|
|
|
fab
|
|
Jyrki Alakuijala
can you show an example of the improvement
|
|
2022-10-12 10:11:11
|
Ok
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:11:45
|
it can be 'fixed' by having arbitrarily large C1 to remove the effect of that division, but it would be more honest to just not call it SSIM at that stage
|
|
|
fab
|
2022-10-12 10:11:54
|
|
|
2022-10-12 10:12:17
|
New has lower bpp
|
|
2022-10-12 10:12:20
|
Bitrate
|
|
2022-10-12 10:12:33
|
But is way way better
|
|
2022-10-12 10:12:43
|
On par with october 2021 (obviously new heuristics at high distance at thet time
|
|
2022-10-12 10:12:56
|
This is the best improvements
|
|
2022-10-12 10:13:07
|
Is 0710jxl vs 1010jxl
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:14:21
|
did we get worse since oct 2021?
|
|
2022-10-12 10:14:31
|
I hope we removed new heuristics
|
|
|
fab
|
|
Jyrki Alakuijala
did we get worse since oct 2021?
|
|
2022-10-12 10:14:39
|
A bit on this particular image
|
|
2022-10-12 10:14:54
|
This 1010 improve the artifacts
|
|
2022-10-12 10:15:06
|
As you can see there are way less
|
|
2022-10-12 10:15:15
|
On new.jxl.png
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:15:17
|
that image looks like it is not high quality to start with -- looks like it was video compressed -- I don't think we should use this kind of test material to guide decisions
|
|
|
fab
|
2022-10-12 10:15:26
|
I know
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:15:32
|
let's use photographs, not video
|
|
|
fab
|
2022-10-12 10:15:43
|
But it looks like boring
|
|
2022-10-12 10:15:56
|
Like new 1010 has less artifacts less bpp
|
|
2022-10-12 10:16:05
|
But is too boring to see
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:16:43
|
can you give a zoom where it is boring?
|
|
|
fab
|
2022-10-12 10:16:52
|
All Image
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:16:55
|
I don't know how to quantify boring
|
|
|
fab
|
2022-10-12 10:17:01
|
Is not entenyearing
|
|
2022-10-12 10:17:05
|
Enteartaing
|
|
2022-10-12 10:17:12
|
Good greens as i said
|
|
2022-10-12 10:17:16
|
Good visually
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:17:17
|
I agree that the image looks boring, but it is because of its blurriness in the 'original'
|
|
|
fab
|
2022-10-12 10:17:22
|
But bad mentally
|
|
2022-10-12 10:17:45
|
Hope you don't aim for strong compression
|
|
|
Jyrki Alakuijala
|
2022-10-12 10:18:04
|
what if you add photon noise, will it be better mentally?
|
|
|
fab
|
2022-10-12 10:18:47
|
I don't know
|
|
2022-10-12 10:19:01
|
I don't want to use noise
|
|
2022-10-12 10:20:32
|
Jxl is becoming the rav1e of webp
|
|
2022-10-12 10:20:41
|
Improving in quality
|
|
2022-10-12 10:20:53
|
But the red get less appealing
|
|
2022-10-12 10:21:02
|
Channel
|
|
2022-10-12 10:24:04
|
Maybe you should add something
|
|
2022-10-12 10:26:19
|
What jon is doing today is great
|
|
2022-10-12 10:40:30
|
I agree with jpeg xl there is the possibility to deband but it has to be avoided
|
|
|
Jyrki Alakuijala
|
|
fab
I don't want to use noise
|
|
2022-10-12 10:54:47
|
why not
|
|
|
fab
|
2022-10-12 10:55:10
|
Nothing
|
|
|
_wb_
|
|
Jyrki Alakuijala
Kornel and Jon: what is your thinking about why to divide by pixel sample means (+C1) in SSIM ?
|
|
2022-10-12 11:25:40
|
it's `2 * avgref * avgdist / (avgref^2 + avgdist^2)`, which you can rewrite to this:
`1 - ((avgref - avgdist)^2 / (avgref^2 + avgdist^2))` or in other words the error is basically the squared difference divided by something close to twice the squared intensity. That is indeed strange, since if the intensities are already psychovisually linear, there is no reason to weigh the darks more than the brights.
|
|
2022-10-12 11:26:17
|
Well I guess I can just change that formula to drop the division and see if it works better after tuning
|
|
2022-10-12 11:33:35
|
I will have an answer tomorrow or so ๐
|
|
|
fab
|
2022-10-12 11:43:55
|
|
|
2022-10-12 11:44:05
|
Wb looks this txt
|
|
|
_wb_
|
2022-10-12 11:44:52
|
Checking what the original SSIM paper has to say about this division by intensity, it looks like they indeed made a braino there...
|
|
|
spider-mario
|
2022-10-12 11:47:22
|
so the whole SSIM edifice is built on lies?
|
|
2022-10-12 11:47:38
|
(Iโm being dramatic for effect)
|
|
|
_wb_
|
2022-10-12 11:48:06
|
Weber's law just means that you need a nonlinear transfer function to be perceptually uniform, but they are basically applying gamma correction twice: once because you use SSIM typically on nonlinear sample values, and then again by doing that division. Basically it means they use an effective transfer curve that is a lot steeper than the ~gamma 2.4 of sRGB, something more like gamma 4.
|
|
2022-10-12 11:48:43
|
Being steeper than sRGB is probably a good thing (XYB also does that), which is probably why this went unnoticed
|
|
2022-10-12 12:17:33
|
it would be nice if that part of the formula can just be dropped while getting as good or better results โ it would make it slightly cheaper to compute
|
|
2022-10-12 12:20:18
|
conceptually, that division is dubious for intensities (it looks like they made a mistake and are effectively doing gamma correction twice), but it's even more dubious when applying SSIM to chroma channels, like in SSIMULACRA (both 1 and 2) and I guess also in DSSIM, right <@826537092669767691> ?
|
|
|
Kornel
|
2022-10-12 12:21:36
|
Yes, it's dubious
|
|
|
_wb_
|
2022-10-12 12:22:10
|
since it basically means that it is more sensitive to errors in the greens and blues (low values of a and b) and less sensitive to errors in the reds and yellows (high values of a and b), and in case of ssimulacra2 the same but for whatever low/high X and B means
|
|
|
Kornel
|
2022-10-12 12:22:27
|
I've noticed that error before. I thought I've fixed it in DSSIM (I even put that in the readme)
|
|
2022-10-12 12:22:42
|
but recently I've reviewed the code andโฆ I'm not sure if it's right or not ๐
|
|
|
_wb_
|
2022-10-12 12:27:25
|
you're scaling down in linear space which is of course correct but I wonder how big a difference it makes compared to scaling down in XYB. Of course the downscaled images will get too dark since Y uses gamma 3, but that happens to both original and distorted and I would assume it can be compensated in the weighing of the different scales...
|
|
2022-10-12 12:29:15
|
in other words I'm wondering if I should downscale in linear RGB too and convert each scale to XYB instead of doing the XYB conversion only once (which is a bit faster but I don't care that much about speed if it makes a significant difference)
|
|
|
Kornel
|
2022-10-12 12:31:10
|
I assume if you want to detect this error - encoders using wrong gamma, then you should avoid repeating it ๐
|
|
2022-10-12 12:32:40
|
I've used to downscale in Lab, but that made DSSIM insensitive to chroma subsampling
|
|
|
_wb_
|
2022-10-12 12:38:31
|
I need to think a bit more about this.
|
|
2022-10-12 12:44:00
|
There are two main kinds of chroma subsampling artifacts in my experience:
- at the 1:1 scale, chroma obviously gets blurred causing small details to get lost if they're mostly in the chroma;
- textured reds and blues get duller since e.g. red-black-red-black 1-pixel stripes have Cr values that alternate between 127 and 0 and subsampling will turn it into a uniform 63, making the red darker and desaturated.
|
|
2022-10-12 12:46:46
|
the first one will be most noticeable in the 1:1 chroma channels themselves, not really in the zoomed-out scales.
the second one does remain visible in the zoomed-out scales and also causes luma artifacts (at least in the L of Lab or the Y of XYB).
|
|
2022-10-12 12:50:32
|
<@826537092669767691> "a/b channels of Lab are compared with lower spatial precision to simulate eyes' higher sensitivity to brightness than color changes." -> wouldn't that be an obstacle to detecting chroma subsampling issues? at least the ones where L is not affected by it (that might be relatively rare but still)
|
|
|
Kornel
|
2022-10-12 12:51:34
|
it's not lower resolution, but higher resolution blurred.
|
|
2022-10-12 12:51:45
|
but yeah, it's a clumsy trade-off
|
|
2022-10-12 12:52:02
|
I don't have a proper model for interaction between luma and chroma
|
|
|
_wb_
|
2022-10-12 12:56:27
|
what I like about XYB is that it's basically just LMS with gamma correction, i.e. it directly models the cones
|
|
2022-10-12 01:01:24
|
the "we see luma in higher res than chroma" thing is partly a lie: it's more like the S-cones are sparser and mostly outside the fovea (so blue-yellow chroma is very low res), while the L and M cones are dense in the fovea, and you can basically see luma using either L or M cones while for the red-green chroma your brain relies on the difference between L and M so that makes it a bit lower resolution than luma (since basically every L or M cone is one "luma pixel" while you need at least one of each to form a "red-green chroma pixel"), but still higher resolution than S
|
|
|
Traneptora
|
2022-10-12 01:12:04
|
ah, so this is the biological explanation for chroma subsampling being more noticable in the Cr channel than in the Cb channel?
|
|
2022-10-12 01:13:20
|
with regard to 4:2:0 YCbCr
|
|
|
_wb_
|
2022-10-12 01:20:58
|
yes, though part of that is also because red/L has more impact on overall luma than blue/S
|
|
|
Jyrki Alakuijala
|
2022-10-12 01:35:49
|
I deviated from classic CIE LMS ideas by fitting the LMS responses that just 'worked the best' for the purpose of image compression
|
|
2022-10-12 01:36:17
|
those were a bit bizarre, because a grayscale image would be described by using all three channels
|
|
2022-10-12 01:36:48
|
Lode and Luca modified that system in a way that grayscale images are described by one channel
|
|
|
_wb_
Well I guess I can just change that formula to drop the division and see if it works better after tuning
|
|
2022-10-12 01:38:46
|
I'm very curious about this ๐ perhaps just optimizing C1 and C2 (as well as linear? repositioning of the resulting values) as part of the optimization could provide an answer
|
|
|
_wb_
|
2022-10-12 01:41:42
|
one idiosyncracy that persisted in XYB is that each channel has its own scale and as a 3D space it is far from perceptually uniform (a change of 0.1 in X is a much bigger change than 0.1 in Y or B). That's no issue for image compression where you use different quantizers per channel anyway, and also not for a metric like ssimulacra2 because it has different weights per channel anyway, but still...
|
|
|
Jyrki Alakuijala
I'm very curious about this ๐ perhaps just optimizing C1 and C2 (as well as linear? repositioning of the resulting values) as part of the optimization could provide an answer
|
|
2022-10-12 01:44:32
|
well I can drop the C1 completely when removing that division. It's too early to tell if this is a good idea or not (I need a day or so of tuning before I can compare), but so far it looks promising (early training iterations are giving slightly better results than the same amount of iterations of tuning with the original formula)
|
|
|
Jyrki Alakuijala
|
2022-10-12 07:28:31
|
experimentation, curiosity and logic are more powerful than dogma ๐
|
|
|
_wb_
|
2022-10-12 07:33:18
|
currently not beating the original version yet, but my tuning takes some time to converge. Tomorrow I'll know.
|
|
2022-10-12 07:33:50
|
after 26 iterations it's currently here:
```
train: 17611 MSE: 50.03070287514473 | vali: 4292 MAE: 5.128380825064601 K: 0.709664 P: 0.881838 | all: 21903 MAE: 5.460425338341202 Kendall: 0.697490 Pearson: 0.865998
```
|
|
2022-10-12 07:34:32
|
while the one with the original ssim formula was here after 26 iterations:
```
train: 17611 MSE: 53.666351158694 | vali: 4292 MAE: 5.273077294606953 K: 0.701941 P: 0.877796 | all: 21903 MAE: 5.6512782840159845 Kendall: 0.684267 Pearson: 0.856832
```
|
|
2022-10-12 07:35:22
|
so it's looking promising...
|
|
2022-10-12 07:36:01
|
if confirmed, this probably deserves a small paper to rectify that historical mistake that went into SSIM
|
|
2022-10-12 07:42:00
|
basically the formula derivation is assuming linear intensities but everyone always uses SSIM on already gamma compressed data (including the authors, who are talking about 8-bit images a few sentences earlier which to me kind of implies it's not linear)
|
|
|
BlueSwordM
|
|
BlueSwordM
Man, compiling compilers is quite the task for a CPU ๐
|
|
2022-10-12 08:30:09
|
Looks like I wasn't able to build Clang 14 lmao.
|
|
2022-10-12 08:30:22
|
It looks like I'll have to build it with Clang 14 another way ๐ค
|
|
|
_wb_
|
2022-10-13 06:42:40
|
After 122 iterations of tuning, it's here:
```
train: 17611 MSE: 45.73781138763714 | vali: 4292 MAE: 4.8122107903523 K: 0.716675 P: 0.886651 | all: 21903 MAE: 5.170079162450586 Kendall: 0.708310 Pearson: 0.873783
```
while the original formula was here after 122 iterations:
```
train: 17611 MSE: 46.868001558068066 | vali: 4292 MAE: 4.93915328458751 K: 0.716456 P: 0.887142 | all: 21903 MAE: 5.294815439616939 Kendall: 0.704895 Pearson: 0.872338
```
|
|
2022-10-13 06:53:03
|
so still looking promising, and I can tell already that if this division by average intensity in the formula is useful at all (which I suspect it isn't), the benefit it brings must be very small
|
|
2022-10-13 07:16:20
|
https://twitter.com/jonsneyers/status/1580457251248431104?s=20
|
|
|
fab
|
2022-10-13 08:26:58
|
Continue more iterations
|
|
|
_wb_
|
2022-10-13 08:30:06
|
sure, it's still improving so I'm not stopping yet.
```
train: 17611 MSE: 45.65768488257069 | vali: 4292 MAE: 4.798193148565712 K: 0.716859 P: 0.886754 | all: 21903 MAE: 5.1603016077332695 Kendall: 0.708444 Pearson: 0.873804
```
|
|
2022-10-13 08:32:12
|
as long as the MAE on the validation set keeps going down (or its Kendall/Pearson keep going up), there is no reason to stop.
|
|
|
fab
|
2022-10-13 08:37:23
|
Less than 200 though are ok.
|
|
2022-10-13 08:38:04
|
I think 300 are necessary
|
|
2022-10-13 08:38:18
|
But you have to be careful.
|
|
|
_wb_
|
2022-10-13 08:39:54
|
I am not too worried about overfitting now that I forced the weights of the subscores to be positive.
|
|
2022-10-13 08:41:46
|
PSNR-HVS says this: https://jon-cld.s3.amazonaws.com/test/distorted/016/mozjpeg-1x1-revert-q16.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/016/jxl-225b6884-pdc1-e6-q50.png
|
|
2022-10-13 08:53:20
|
VMAF says this: https://jon-cld.s3.amazonaws.com/test/distorted/792079/mozjpeg-2x2-revert-q18.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/792079/jxl-e6-q48.png
|
|
2022-10-13 08:54:16
|
and this: https://jon-cld.s3.amazonaws.com/test/distorted/1910225/mozjpeg-2x2-revert-q16.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1910225/jxl-225b6884-pdc1-e6-q38.png
|
|
|
fab
|
2022-10-13 08:54:49
|
Which encoder is it dev?
|
|
|
_wb_
|
2022-10-13 08:54:51
|
and this: https://jon-cld.s3.amazonaws.com/test/distorted/1943411/mozjpeg-2x2-revert-q22.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1943411/jxl-225b6884-pdc1-e6-q70.png
|
|
|
fab
|
2022-10-13 08:54:53
|
DEV
|
|
2022-10-13 08:54:59
|
?
|
|
|
_wb_
|
2022-10-13 08:55:31
|
and this: https://jon-cld.s3.amazonaws.com/test/distorted/1938351/mozjpeg-2x2-revert-q22.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1938351/jxl-225b6884-pdc1-e6-q56.png
|
|
2022-10-13 08:55:37
|
and so on, and so on
|
|
|
fab
|
2022-10-13 08:56:28
|
So the encoder is DEV?
|
|
|
_wb_
|
2022-10-13 08:56:42
|
?
|
|
2022-10-13 08:57:16
|
VMAF is systematically saying that bad blocky jpegs are better than smooth jxl images
|
|
|
fab
|
2022-10-13 08:58:06
|
Libjxl is dev version?
|
|
2022-10-13 08:58:20
|
Nightly version?
|
|
2022-10-13 08:58:43
|
225b6884
|
|
|
_wb_
|
2022-10-13 08:59:16
|
that's a recent git version, yes
|
|
|
fab
|
2022-10-13 08:59:33
|
Ah so not dev
|
|
|
_wb_
|
2022-10-13 09:00:00
|
VMAF says https://jon-cld.s3.amazonaws.com/test/distorted/1963557/mozjpeg-2x2-revert-q20.jpg is better than https://jon-cld.s3.amazonaws.com/test/distorted/1963557/jxl-e6-q54.png
|
|
|
fab
|
2022-10-13 09:00:11
|
After 4 days no dev version
|
|
|
_wb_
|
2022-10-13 09:00:23
|
this happens both for the 0.7 release of jxl (like in this example) and in the current git version of jxl
|
|
2022-10-13 09:01:33
|
curiously VMAF likes bad jpegs a lot, to the point that it is saying that mozjpeg, avif and jxl are all worse than bad jpegs
|
|
|
improver
|
2022-10-13 09:26:29
|
"yes i like them baked" t. VMAF
|
|
|
spider-mario
|
2022-10-13 09:28:45
|
โall this blocking really enhances the contrast of the imageโ
|
|
|
_wb_
|
2022-10-13 10:20:21
|
```
train: 17611 MSE: 45.59336752597455 | vali: 4292 MAE: 4.772055238533731 K: 0.718784 P: 0.888073 | all: 21903 MAE: 5.1490078350745 Kendall: 0.709031 Pearson: 0.874058
```
already slightly better Kendall correlation (on the full set) than what I could obtain with the original formula, so I think I can conclude that this division is not useful
|
|
2022-10-13 10:21:12
|
now let's see if it will converge to something better than with the original formula
|
|
|
WoofinaS
|
2022-10-13 02:03:31
|
Could you repost this in the av1 server or do you not mind me doing it?
|
|
|
_wb_
|
2022-10-13 02:03:43
|
i don't mind
|
|
|
Traneptora
|
|
_wb_
https://twitter.com/jonsneyers/status/1580457251248431104?s=20
|
|
2022-10-13 02:52:56
|
> Let me try to explain this one. Jyrki questioned this divisor in the SSIM equation, which effectively boils down to penalizing errors in dark regions more than in bright regions.
isn't that because errors in dark regions are more visible to humans?
|
|
|
_wb_
|
2022-10-13 02:53:28
|
they are, but you shouldn't correct for it twice
|
|
|
Traneptora
|
2022-10-13 02:53:38
|
ah, I see
|
|
|
_wb_
|
2022-10-13 08:57:21
|
currently here:
```
train: 17611 MSE: 45.4634956462169 | vali: 4292 MAE: 4.734498022711265 K: 0.718613 P: 0.888034 | all: 21903 MAE: 5.124381241221485 Kendall: 0.708990 Pearson: 0.873963
```
|
|
2022-10-13 08:58:06
|
slightly better MAE for both validation and training set than with the original formula. No big difference though.
|
|
2022-10-13 09:16:39
|
I'll let it tune another night
|
|
2022-10-14 07:36:12
|
this is where it went over the night:
```
train: 17611 MSE: 45.430967373087206 | vali: 4292 MAE: 4.717704574846586 K: 0.718613 P: 0.887978 | all: 21903 MAE: 5.115279915539398 Kendall: 0.708991 Pearson: 0.873891
```
|
|
2022-10-14 07:36:56
|
stopped tuning now, MAE is still slowly decreasing but Kendall/Pearson are starting to get worse
|
|
2022-10-14 07:39:52
|
I'm now going to try doing the downscale in linear RGB instead of in XYB, as <@826537092669767691> is doing in DSSIM. It makes sense, and it looks like it is needed to get an example like https://twitter.com/jonsneyers/status/1580462629851889666 right โ after retuning with the corrected SSIM formula, I am still getting a better score for the black-on-white text than for the white-on-black text, and it seems to be caused by the downscaled scores being worse due to incorrect downscaling
|
|
2022-10-14 07:41:27
|
this makes it a bit slower since the XYB conversion now needs to be done on ~twice the pixels, but whatever, it's still pretty fast and that's the perceptually correct thing to do
|
|
2022-10-14 07:51:22
|
i'll initialize the tuning with the weights I just got (for a version that downscales in XYB), which should help to make it re-tune faster since that'll be a better guess than just starting with all weights equal to 1
|
|
2022-10-14 08:53:57
|
same reason as why PSNR is included, I guess
|
|
2022-10-14 08:54:11
|
it's not an endorsement
|
|
2022-10-14 08:56:27
|
also: even bad metrics can be useful sometimes to detect encoder bugs. PSNR is not useful as a perceptual metric, but e.g. if there's a nonmonotonicity according to PSNR (higher quality setting resulting in lower score), it could be a sign of a bug
|
|
2022-10-14 09:14:18
|
<@826537092669767691> here are some DSSIM plots
|
|
2022-10-14 09:14:39
|
|
|
2022-10-14 09:15:08
|
Orange lines are libaom 4:4:4 tune=ssim at speeds 3 to 9
|
|
2022-10-14 09:15:34
|
white line is mozjpeg 4:2:0, baseline is mozjpeg -revert 4:2:0
|
|
2022-10-14 09:15:53
|
blueish line is current git cjxl -e 6
|
|
2022-10-14 09:16:43
|
the range is from mozjpeg q98 on the left to q30 on the right
|
|
2022-10-14 09:21:28
|
vertical axis is bytes saved when comparing avg bpp of an encode setting to the avg bpp of an 'equivalent' q setting of mozjpeg -revert, where 'equivalent' means "same average dssim score" in the first plot and "same p10 dssim score" in the second
|
|
2022-10-14 09:22:44
|
oops I made a braino, p10 is not a good idea if the metric is lower-is-better, I wanted to align by the p10 at the worst end, not at the best end
|
|
2022-10-14 09:25:03
|
fixing that, this is what the second plot should be
|
|
2022-10-14 09:34:54
|
so basically according to dssim, at speed 9 (which is about as fast as jxl e6), avif is worse than mozjpeg; at speed 8 it is slightly better on average but slightly worse in the low quality end at percentile 10, at speed 7 it starts to be clearly better, and you need to go to speed 3-4 to get close to jxl (though at q>80 there remains a gap). Those are 20-30 times slower than jxl e6 though.
|
|
2022-10-14 09:59:37
|
Just to illustrate how bad vmaf is as a metric, this is the plot for vmaf(-neg). It says that below q66 (vmaf 90) or so, mozjpeg, avif and jxl are at most 20% better than unoptimized jpeg, and at higher quality they're even worse by 30% or more
|
|
2022-10-14 10:04:23
|
This is what Butteraugli 3-norm says:
|
|
2022-10-14 10:06:38
|
PSNR-Y says this:
|
|
2022-10-14 10:07:39
|
MS-SSIM says this:
|
|
2022-10-14 10:22:09
|
<@321486891079696385> can you give me a good avifenc command line that doesn't require custom compilation (works out of the box on a default avifenc `Version: 0.10.1 (aom [enc/dec]:3.4.0)`) ? What is currently in the plots above is `-c aom -y 444 --min 0 --max 63 -s $speed -j 1 -a end-usage=q -a tune=ssim -a cq-level=$q` but I'm open to trying alternatives. Testing the slower speeds takes ages though, and they're not really deployable anyway, so preferably it should be something that works well at speed 6-7 or so.
|
|
2022-10-14 10:26:15
|
<@826537092669767691> i'll also benchmark `cavif-rs 1.3.5`, fyi. It's kind of refreshing that it doesn't come with numerous options besides quality and speed, I like that ๐
|
|
|
WoofinaS
|
2022-10-14 11:11:54
|
`-a deltaq-mode=3` is a tune dedicated to avif encoding.
`-a sharpness=(2/3)` can be used to limit filtering and almost always lowers max distortion according to butter.
Almost all other settings hurt fidelity in one way or another/flat out are not functional. Both are also stock.
|
|
2022-10-14 11:19:49
|
Rav1e last time I check does not have as good allintra performance as aomenc, however that might be different now.
|
|
|
_wb_
|
2022-10-14 01:18:29
|
https://arxiv.org/pdf/2107.04510.pdf
|
|
|
WoofinaS
|
2022-10-14 01:28:30
|
Oh btw has cloudinary releases their results for their encoder and metric analysis publicly?
|
|
|
_wb_
|
2022-10-14 01:36:33
|
not yet but we're planning to write papers and blogposts
|
|
2022-10-14 01:37:35
|
in the meantime, I do tend to share stuff on twitter and here, since papers and blogposts will take a while
|
|
|
Kornel
|
2022-10-14 03:16:03
|
I've tweaked rav1e settings for cavif-rs, and it's generally on par with aom.
|
|
2022-10-14 03:16:16
|
aom supports more features, so there are types of images where it excels.
|
|
2022-10-14 03:16:42
|
but rav1e is a bit faster overall, so if you're trying to balance speed/quality, they're close.
|
|
|
BlueSwordM
|
|
_wb_
<@321486891079696385> can you give me a good avifenc command line that doesn't require custom compilation (works out of the box on a default avifenc `Version: 0.10.1 (aom [enc/dec]:3.4.0)`) ? What is currently in the plots above is `-c aom -y 444 --min 0 --max 63 -s $speed -j 1 -a end-usage=q -a tune=ssim -a cq-level=$q` but I'm open to trying alternatives. Testing the slower speeds takes ages though, and they're not really deployable anyway, so preferably it should be something that works well at speed 6-7 or so.
|
|
2022-10-14 04:22:18
|
I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
2022-10-14 04:25:23
|
I'm surprised chroma deltaq hasn't become default for 4:4:4 content yet, or why DQ3 hasn't been enabled by default yet.
|
|
|
WoofinaS
|
2022-10-14 04:28:24
|
Rav1e might have similar appeal as aom but ironically and actually oppositely compared to inter performance it has fidelity issues.
I was never able to get good d scores with rav1e at acceptable file sizes.
Rav1e also *isn't threaded* so in practice it's slower by quite the margin.
|
|
|
BlueSwordM
I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
2022-10-14 04:34:20
|
I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.
|
|
|
_wb_
|
|
BlueSwordM
I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
2022-10-14 04:39:24
|
I only use png input so then it doesn't make a difference whether I set -y 444 or not, right?
|
|
|
BlueSwordM
|
|
WoofinaS
I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.
|
|
2022-10-14 04:41:51
|
Because it's default now?
|
|
|
_wb_
I only use png input so then it doesn't make a difference whether I set -y 444 or not, right?
|
|
2022-10-14 04:42:06
|
Indeed.
|
|
|
WoofinaS
I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.
|
|
2022-10-14 04:42:23
|
Oh right, he's on "stable" 0.10.0.
|
|
|
_wb_
|
|
WoofinaS
Rav1e might have similar appeal as aom but ironically and actually oppositely compared to inter performance it has fidelity issues.
I was never able to get good d scores with rav1e at acceptable file sizes.
Rav1e also *isn't threaded* so in practice it's slower by quite the margin.
|
|
2022-10-14 04:43:53
|
multithreading is useful for an end-user encoding a single image on a beefy laptop or desktop, but for deployments like in Cloudinary, we usually avoid multithreading since it usually doesn't scale perfectly so while it might be good for latency, for overall throughput it's better to do single core per image if you have to encode large amounts of images
|
|
|
WoofinaS
|
2022-10-14 04:44:01
|
He said stock my guy.
|
|
|
Traneptora
|
2022-10-14 04:45:20
|
yea, it's generally easier to batch process stuff in a single core per image and let the operating system handle the parallelization
|
|
|
BlueSwordM
|
|
WoofinaS
He said stock my guy.
|
|
2022-10-14 04:45:37
|
That's not the issue there. In git/0.11.0 RC, the SSIM based post RD tune is the default.
|
|
|
Traneptora
|
2022-10-14 04:45:39
|
since it scales up much better
|
|
|
WoofinaS
|
|
_wb_
multithreading is useful for an end-user encoding a single image on a beefy laptop or desktop, but for deployments like in Cloudinary, we usually avoid multithreading since it usually doesn't scale perfectly so while it might be good for latency, for overall throughput it's better to do single core per image if you have to encode large amounts of images
|
|
2022-10-14 04:46:06
|
Being able to saturate 2 threads is important still however rav1e struggles to even do that.
Running a encoder per thread can actually be slower
|
|
|
BlueSwordM
That's not the issue there. In git/0.11.0 RC, the SSIM based post RD tune is the default.
|
|
2022-10-14 04:46:36
|
Ah okay
|
|
|
fab
|
|
_wb_
|
2022-10-14 07:56:19
|
i'm now tuning an updated ssimulacra2 with the following changes:
- added 0.05 to Y to circumvent the FastGaussian inaccuracies
- modified SSIM formula to drop the double gamma correction
- downscaling in linear RGB instead of XYB
- using 2-norm and 8-norm instead of 1-norm and 4-norm: I saw that many of the weights for 1-norms were tuned to 0 so that norm is not very useful; I hope 2,8 will be a better combination
|
|
2022-10-15 08:11:06
|
ok no looks like it's not
|
|
|
Jyrki Alakuijala
|
2022-10-17 06:54:08
|
add all 1,2,4,8,16 and optimize their weights?
|
|
|
_wb_
|
2022-10-17 07:29:47
|
Yeah I might do that at some point, it blows up the number of weights to tune though. I think 1 and 4 norm is a good set if you have only 2 of them, and I doubt if having more than 2 norms is going to add much.
|
|
2022-10-17 07:31:18
|
I had some trouble tuning a linearly downscaling variant, wasn't getting very good results until I adjusted the objective function to optimize more for MSE and Kendall and less for Pearson
|
|
|
Jyrki Alakuijala
|
2022-10-17 07:43:40
|
I keep all weights the same for each aggregation level
|
|
|
_wb_
|
2022-10-17 09:34:18
|
Tuning causes e.g. weights for B to be low or zero at 1:1 and 1:2 and higher at 1:16 and 1:32, while for X and Y it's the other way around
|
|
|
Jyrki Alakuijala
|
2022-10-17 09:52:25
|
interesting
|
|
|
_wb_
|
2022-10-18 07:55:56
|
I cannot get to quite as good correlation with subjective data when doing SSIM correctly and when downscaling correctly (in linear instead of in XYB). But I suspect that it might still be a good idea. I think the better fit with subjective data may partially be because it could easily "see" which image was a jxl and which image was not (by looking at 1:8 error when downscaling in XYB, which is naturally going to be lower for jxl since that's its DC), and possibly it to some extent used that to infer things that allowed it to get a better fit to our data, without any real perceptual foundation. Doing downscaling in linear space is perceptually correct and doesn't correspond to what any codec internally does (they all obviously work in a gamma-compressed space), so it's more "fair" and less likely to 'overfit'.
|
|
2022-10-18 07:59:54
|
I can still get a MAE of under 5 on the validation set, so the difference is not huge, just a bit more error and a bit lower Kendall/Spearman/Pearson correlation than what I could get previously (which was already slightly worse than what I could get with the negative weights, but that was just overfitting in a way that generalizes very poorly)
|
|
2022-10-18 10:50:50
|
These are the weights it converged to:
```
X scale 1:1, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
X scale 1:1, 4-norm: ssim: 1.0035479352512353 | ringing: 0.00011322061110474735 | blur: 0.00040442991823685936
X scale 1:2, 1-norm: ssim: 0.0018953834105783773 | ringing: 0.0 | blur: 0.0
X scale 1:2, 4-norm: ssim: 8.982542997575905 | ringing: 0.9899785796045556 | blur: 0.0
X scale 1:4, 1-norm: ssim: 0.9748315131207942 | ringing: 0.9581575169937973 | blur: 0.0
X scale 1:4, 4-norm: ssim: 0.5133611777952946 | ringing: 1.0423189317331243 | blur: 0.000308010928520841
X scale 1:8, 1-norm: ssim: 12.149584966240063 | ringing: 0.9565577248115467 | blur: 0.0
X scale 1:8, 4-norm: ssim: 1.0406668123136824 | ringing: 81.51139046057362 | blur: 0.30593391895330946
X scale 1:16, 1-norm: ssim: 1.0752214433626779 | ringing: 1.1039042369464611 | blur: 0.0
X scale 1:16, 4-norm: ssim: 1.021911638819618 | ringing: 1.1141823296855722 | blur: 0.9730845751441705
X scale 1:32, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
X scale 1:32, 4-norm: ssim: 0.9833918426095505 | ringing: 0.7920385137059867 | blur: 0.9710740411514053
```
|
|
2022-10-18 10:51:01
|
```
Y scale 1:1, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
Y scale 1:1, 4-norm: ssim: 0.5387077903152638 | ringing: 0.0 | blur: 3.4036945601155804
Y scale 1:2, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
Y scale 1:2, 4-norm: ssim: 2.337569295661117 | ringing: 0.0 | blur: 5.707946510901609
Y scale 1:4, 1-norm: ssim: 37.83086423878157 | ringing: 0.0 | blur: 0.0
Y scale 1:4, 4-norm: ssim: 3.8258200594305185 | ringing: 0.0 | blur: 0.0
Y scale 1:8, 1-norm: ssim: 24.073659674271497 | ringing: 0.0 | blur: 0.0
Y scale 1:8, 4-norm: ssim: 13.181871265286068 | ringing: 0.0 | blur: 0.0
Y scale 1:16, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
Y scale 1:16, 4-norm: ssim: 10.00750121262895 | ringing: 0.0 | blur: 0.0
Y scale 1:32, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
Y scale 1:32, 4-norm: ssim: 52.51428385603891 | ringing: 0.0 | blur: 0.0
```
|
|
2022-10-18 10:51:06
|
```
B scale 1:1, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
B scale 1:1, 4-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
B scale 1:2, 1-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
B scale 1:2, 4-norm: ssim: 0.0 | ringing: 0.0 | blur: 0.0
B scale 1:4, 1-norm: ssim: 0.0 | ringing: 0.9946464267894417 | blur: 0.0
B scale 1:4, 4-norm: ssim: 0.0 | ringing: 0.0006040447715934816 | blur: 0.0
B scale 1:8, 1-norm: ssim: 0.0 | ringing: 0.9945171491374072 | blur: 0.0
B scale 1:8, 4-norm: ssim: 2.8260043809454376 | ringing: 1.0052642766534516 | blur: 8.201441997546244e-05
B scale 1:16, 1-norm: ssim: 12.154041855876695 | ringing: 32.292928706201266 | blur: 0.992837130387521
B scale 1:16, 4-norm: ssim: 0.0 | ringing: 30.71925517844603 | blur: 0.00012309907022278743
B scale 1:32, 1-norm: ssim: 0.0 | ringing: 0.9826260237051734 | blur: 0.0
B scale 1:32, 4-norm: ssim: 0.0 | ringing: 0.9980928367837651 | blur: 0.012142430067163312
```
|
|
2022-10-18 10:52:29
|
so it decided that scales 1:1 and 1:2 don't matter at all for B, only when zooming out B becomes important. That does make sense, but I'm still a bit surprised that it doesn't even want to have some low weights there
|
|
|
Jyrki Alakuijala
|
2022-10-18 10:52:46
|
downscaling in linear is not necessarily a good idea
|
|
2022-10-18 10:53:26
|
when I started with guetzli and pik, I was full of enthusiasm to do everything in linear -- trying out linear DCTs
|
|
2022-10-18 10:53:50
|
pretty much nothing worked and the experience wasn't better
|
|
|
_wb_
|
2022-10-18 10:53:57
|
yeah no I don't think it works for compression purposes
|
|
|
Jyrki Alakuijala
|
2022-10-18 10:54:08
|
in some computer graphics uses it works
|
|
|
_wb_
|
2022-10-18 10:54:14
|
for a metric though, it does make sense
|
|
|
Jyrki Alakuijala
|
2022-10-18 10:54:20
|
but not necessarily in downscaling
|
|
|
_wb_
|
2022-10-18 10:56:16
|
it does make the difference in that example of black-on-white vs white-on-black, when doing nonlinear downscaling it says the black-on-white one is better while when doing linear downscaling it says the opposite (and agrees with butteraugli and dssim and my own eyes)
|
|
2022-10-18 11:02:42
|
also somewhat surprising is that it doesn't use the ringing term for Y, only for X and B โ though of course ringing and blur are to some extent already included in ssim, which has large weights for Y
|
|
2022-10-18 11:06:50
|
anyway, I think it's useful if a metric doesn't use the exact same colorspace as the codecs it is testing โ DSSIM working in Lab space makes it unlikely that it gives an advantage to any specific codec, since none of them work in Lab space
|
|
2022-10-18 11:08:16
|
this is probably one of the biggest problems with VMAF, PSNR-Y, PSNR-HVS, etc: they all work in YCbCr like most of the codecs do, which inherently gives an advantage to the codecs that work in YCbCr and a disadvantage to those that don't.
|
|
2022-10-18 11:10:22
|
so that's another good reason to downscale in linear RGB instead of in XYB, so at least on the zoomed out scales (which probably contribute the most to the overall score), it doesn't directly correspond to what jxl is using
|
|
2022-10-18 02:09:11
|
this is ssimulacra2 as it is now on git:
|
|
2022-10-18 02:09:58
|
and this is what happens when I do the changes to fix SSIM and downscaling in linear space:
|
|
2022-10-18 02:10:31
|
blue line is current git libjxl, orange are the various libaom speed settings, white is mozjpeg, green is webp m6
|
|
2022-10-18 02:11:32
|
this is just for a few images, to get an idea (it takes time to recompute stuff for all images)
|
|
2022-10-18 02:12:41
|
overall not a huge difference in what it says, in both cases it says jxl > avif > mozjpeg ~= webp
|
|
2022-10-18 02:13:57
|
just somewhat smaller gap between avif and jxl, as I expected
|
|
2022-10-18 02:15:35
|
looking at the disagreements between those two variants, the disagreements are mostly minor (things like one saying A=62, B=64 while the other says A=64, B=62), but I do tend to agree with what the linear scaling variant says
|
|
|
BlueSwordM
I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
2022-10-18 02:20:05
|
I tried sharpness=2 deltaq-mode=3, but comparing at speed 6, none of the metrics seem to really like it (only butteraugli says it's a few percent better at the high quality end, the others say it's slightly worse)
|
|
2022-10-18 02:20:20
|
of course the metrics could very well be wrong about this
|
|
|
WoofinaS
|
2022-10-18 02:20:47
|
Might be a fault of cpu 6 pruning then as it's generally a fair bit better at slower presets.
|
|
|
_wb_
|
2022-10-18 02:27:43
|
I will try slower speeds too, but speed 6 is already on the slow end for practical deployment...
|
|
|
WoofinaS
|
2022-10-18 02:28:52
|
I can very much see it being applicable for things like Netflix thumbnail previews.
|
|
|
_wb_
|
2022-10-18 03:38:51
|
Sure, for the top 0.001% images that get viewed millions or billions of times, it makes sense to use speed 0. But for 99% of images it doesn't really make sense to spend that amount of cpu just to save a little bandwidth.
|
|
|
BlueSwordM
|
|
_wb_
I tried sharpness=2 deltaq-mode=3, but comparing at speed 6, none of the metrics seem to really like it (only butteraugli says it's a few percent better at the high quality end, the others say it's slightly worse)
|
|
2022-10-19 03:38:34
|
Ohhh right, I forgot you're on mainline aomenc. Silly me, I forgot that aomenc devs made `--sharpness=X` useless back in June 2021 above >1(and made the RD multiplier tuning useless through it).
Anyway, this should work better since RD SB now exists:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
|
_wb_
|
2022-10-19 03:30:49
|
I just had a random idea: instead of working with SSIM = (intensity error) * (contrast / structure error), why not separate those two terms, compute norms of both of them, and tune weights for that. That means more weights to learn, but it might work better...
|
|
2022-10-19 03:31:40
|
(or worse, we'll see)
|
|
|
BlueSwordM
Ohhh right, I forgot you're on mainline aomenc. Silly me, I forgot that aomenc devs made `--sharpness=X` useless back in June 2021 above >1(and made the RD multiplier tuning useless through it).
Anyway, this should work better since RD SB now exists:
`avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
|
|
2022-10-19 08:54:35
|
I haven't looked at any actual image, but according to the metrics, `-a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3` is not an overall improvement at speed 6. SSIM and MS-SSIM say it's good at the higher end (above 1.3 bpp), but all other metrics (Butteraugli, DSSIM, VMAF-NEG, PSNR-HVS) say it's worse on average. For the same cq-level setting, the quality with deltaq-mode=3 is significantly better according to all metrics, but the bpp also grows, so the curve looks worse than without the deltaq-mode=3. The curves look better than when adding `--sharpness=2` too though, you were right about that.
|
|
2022-10-19 08:55:54
|
does deltaq-mode=4 do something very different, or is it similar to =3 but 'more'?
|
|
|
BlueSwordM
|
|
_wb_
does deltaq-mode=4 do something very different, or is it similar to =3 but 'more'?
|
|
2022-10-19 08:56:05
|
It does something a bit different, yes.
|
|
2022-10-19 08:56:17
|
I'm actually surprised, especially since DQ3/DQ4 and a couple of other all-intra patches have actually been verified with butteraugli to perform better.
Note that I only said a couple, and only in all-intra.
|
|
|
_wb_
|
2022-10-19 08:56:40
|
it could be image-dependent
|
|
2022-10-19 08:57:30
|
i'm testing on the 250 images we used in our subjective study plus the 49 images of the daala testset
|
|
2022-10-19 09:00:33
|
this is avg bpp vs avg BA 3-norm on the 299 images, for cq=1..63 in steps of 2. Orange line at the bottom is with just tune=ssim, middle line is dq3 added, top line is dq3+sharpness2 added.
|
|
2022-10-19 09:05:18
|
btw you may have noticed that I make my plots in darkmode now, this is because I was hurting my eyes because I have been doing a lot of switching between comparing images at max display brightness and looking at plots ๐
|
|
2022-10-24 07:37:37
|
I am scratching my head a bit here. I made a variant of ssimulacra2 that computes 180 different subscores:
- 6 scales (1:1 to 1:32)
- 3 components (X,Y,B)
- 1-norm and 4-norm of ringing-map and blur-map
- 1,2,4 norm of the squared error map
- 1,2,4 norm of the ssim map
and then let it tune to the subjective data I have
|
|
2022-10-24 07:38:15
|
of all those 180 subscores, it ended up ignoring most of them (their weights converged to zero), and it only ended up using these:
X scale 1:1, 4-norm: error: 161148.74838407728
Y scale 1:1, 4-norm: blur: 0.2633687687930647
Y scale 1:1, 4-norm: ssim: 2.6431663619107493
Y scale 1:2, 4-norm: ssim: 0.9498500183656345
Y scale 1:4, 1-norm: ssim: 61.37707585349767
Y scale 1:4, 2-norm: ssim: 9.257658472175498
Y scale 1:8, 1-norm: ssim: 81.10181445428385
Y scale 1:8, 2-norm: ssim: 29.706924623161246
B scale 1:1, 4-norm: error: 407.8120233025815
B scale 1:2, 4-norm: error: 1087.4658682942747
B scale 1:16, 1-norm: ringing: 6.096135168120978
B scale 1:16, 4-norm: ringing: 14.100261857992544
B scale 1:16, 1-norm: ssim: 14.130922986455808
|
|
2022-10-24 07:44:12
|
now I wonder if that's just how the human visual system works, or if my tuning process is getting stuck in some local optimum
|
|
2022-10-25 10:50:23
|
https://jon-cld.s3.amazonaws.com/test/chall_of_fshame_SSIMULACRA_2_modelD.html
|
|
2022-10-25 10:51:47
|
I wrote that script to find bugs/issues in ssimulacra2 but it kind of became a demo page showing how bad psnr/ssim/vmaf are
|
|
2022-10-27 01:17:26
|
is there are precompiled binary windows version of butteraugli somewhere?
|
|
|
Jyrki Alakuijala
|
2022-10-31 11:18:16
|
I don't know
|
|
|
BlueSwordM
|
|
_wb_
is there are precompiled binary windows version of butteraugli somewhere?
|
|
2022-10-31 11:33:18
|
There was a website at one point, but it seems like they stopped hosting libjxl libs.
|
|
2022-10-31 11:33:30
|
Anyway, I'll ask the website owner and then reply back with the DDL link.
|
|
|
Pigophone
|
2022-11-01 01:15:32
|
<@794205442175402004> butteraugli is available on vcpkg, I compiled right now with `vcpkg install butteraugli --triplet=x64-windows`.
installing vcpkg is fairly easy if you'd rather compile it yourself
|
|
|
|
afed
|
2022-11-02 08:00:49
|
what about this metric?
https://github.com/richzhang/PerceptualSimilarity
|
|
|
_wb_
|
2022-11-02 08:26:13
|
I tried it but correlation with subjective results was a bit disappointing for the quality range we are interested in.
|
|
|
Traneptora
|
2022-11-02 10:50:47
|
I'm looking at your hall of shame and it looks to me like metrics overvalue sharpness
|
|
2022-11-02 10:51:03
|
|
|
2022-11-02 10:51:10
|
|
|
2022-11-02 10:51:27
|
first one is sharper, but it distorts the colors in a way that looks pretty gross
|
|
2022-11-02 10:51:37
|
legacy JPEG tends to do this, it is sharp but distorted
|
|
|
improver
|
2022-11-02 10:58:11
|
i can see more details in sharper one though
|
|
2022-11-02 10:58:51
|
and less distorted shapes
|
|
|
_wb_
|
2022-11-03 07:56:48
|
In the end most of the distorted images there are just way to low quality and it is a bit of a 'pick your poison' situation
|
|
2022-11-03 08:01:30
|
But I think we made a good decision in the jxl encoder to ensure that DC (and low freq AC) gets enough bits and in the jxl spec to have good anti-banding tools. Most metrics don't seem to be bothered much by banding, yet this is an artifact that ruins an image even from far away or after downscaling.
|
|
|
diskorduser
|
2022-11-03 03:42:14
|
Is there any benchmark on decoding power consumption? Jxl and avif. IMO it's important for battery powered devices like laptops and smartphones
|
|
|
_wb_
|
2022-11-03 03:59:31
|
Not afaik, but should be roughly comparable I think.
|
|
2022-11-03 04:00:52
|
For still images on the web, probably decode power consumption is not that important compared to extra time spent on transfer and with the screen on while waiting for an image to load
|
|
|
|
veluca
|
2022-11-03 06:48:19
|
FWIW, probably AVIF decoding power consumption will be lower, those FP units are *expensive*
|
|
|
_wb_
|
2022-11-03 07:18:31
|
Yeah we should probably at some point see if decode can be done completely with the arithmetic and buffers in (mostly) int16...
|
|
2022-11-03 07:21:26
|
(some intermediate results in int32 for idct but most other stuff in int16)
|
|
|
Traneptora
|
|
veluca
FWIW, probably AVIF decoding power consumption will be lower, those FP units are *expensive*
|
|
2022-11-03 07:42:43
|
you can in theory do it in fixed point, can't you?
|
|
|
|
veluca
|
2022-11-03 07:43:30
|
"in theory" is the keyword here ๐
|
|
|
Traneptora
|
2022-11-03 07:43:46
|
dunno whatever happened to Lynne's decoder ideas
|
|
|
_wb_
|
|
veluca
"in theory" is the keyword here ๐
|
|
2022-11-03 07:49:43
|
Is there anything besides idct that would be tricky to do in fixedpoint? You already did xyb2rgb, right?
|
|
2022-11-03 07:50:36
|
Gaborish, epf, blending of frames etc, that should all not be too hard to do in int16 instead of float, right?
|
|
2022-11-03 07:52:48
|
I mean, it's not like we really depend on the exponent part of floats or something like that, we're mostly just using them as int24_t
|
|
|
Traneptora
|
2022-11-03 07:55:08
|
given that X, Y, B are typically between 0 and 1 that makes sense to me
|
|
2022-11-03 07:55:26
|
floats give a bit more precision in the lower order bits but if it's within tolerance I suppose that doesn't matter
|
|
|
_wb_
|
2022-11-03 07:58:06
|
X is typically very close to zero, its range is like -0.03 to 0.03 or something like that
|
|
2022-11-03 07:58:48
|
But I don't think we need subnormals or anything like that
|
|
|
Traneptora
|
2022-11-03 07:59:13
|
why is X so close to zero, is that because L and M are typically much closer to each other than either are to S?
|
|
|
_wb_
|
2022-11-03 08:00:53
|
Yeah L and M are pretty similar so you can't really get abs(L-M) to be large
|
|
|
|
veluca
|
2022-11-03 08:01:11
|
EPF has divisions ๐ฑ
|
|
|
Traneptora
|
2022-11-03 08:01:24
|
I wonder if you can use Rationals
|
|
|
|
veluca
|
2022-11-03 08:01:41
|
Otherwise yeah, it should be fine, but I do wonder about the range of things
|
|
|
_wb_
|
2022-11-03 08:02:16
|
Is integer division that bad? Can it be approximated?
|
|
|
|
veluca
|
2022-11-03 08:02:28
|
Is it that bad? YES
|
|
2022-11-03 08:02:39
|
Can it be approximated? Probably
|
|
|
_wb_
|
2022-11-03 08:03:43
|
Yeah the ranges will need to be figured out
|
|
|
|
veluca
|
2022-11-03 08:03:47
|
to give an idea, compared to SIMD multiplies, I think i16-bit divisions could easily be 100x slower
|
|
|
_wb_
|
2022-11-03 08:04:24
|
What's the range of the divisors?
|
|
|
|
veluca
|
2022-11-03 08:04:40
|
pretty much anything unfortunately
|
|
2022-11-03 08:05:25
|
the most reasonable thing IMO is to do an initial approximation using only the highest set bit, and then do 2 newton steps
|
|
2022-11-03 08:05:53
|
still probably ~20 cycles for a vector, but much better than ~500
|
|
|
_wb_
|
2022-11-03 08:09:15
|
Anyway jxl without epf is still kind of nice, so if epf cannot be made fast, it's mostly a problem for the lower qualities, not for the higher qualities where epf is probably not even used by default
|
|
|
Traneptora
|
|
veluca
Otherwise yeah, it should be fine, but I do wonder about the range of things
|
|
2022-11-03 08:23:49
|
why not have something like
```c
struct Rational {
int32_t numer, denom;
}
```
|
|
2022-11-03 08:24:15
|
addition would be slower but division would be just as fast as multiplication
|
|
|
|
veluca
|
2022-11-03 08:24:16
|
you need a division to do a weighted average
|
|
2022-11-03 08:24:30
|
carrying around the rational number doesn't help you that much
|
|
|
Traneptora
|
2022-11-03 08:24:33
|
Yea, you'd just keep track of rational numbers as an ordered pair of integers
|
|
2022-11-03 08:24:43
|
and you wouldn't actually do the division until the final stage
|
|
|
|
veluca
|
2022-11-03 08:24:57
|
eh, probably not worth it
|
|
2022-11-03 08:25:14
|
actually, definitely not worth it
|
|
|
Traneptora
|
2022-11-03 08:25:18
|
how so?
|
|
|
|
veluca
|
2022-11-03 08:25:25
|
EPF is 3 divisions (+eps) per pixel at worst
|
|
|
Traneptora
|
2022-11-03 08:25:47
|
I'd have to check how EPF works
|
|
|
|
veluca
|
2022-11-03 08:25:54
|
(good luck xD)
|
|
2022-11-03 08:26:19
|
but long story short, for each pixel it computes a vector of weights of neighbours and then replaces each neighbour with the weighted avg
|
|
2022-11-03 08:26:44
|
so you need one reciprocal per EPF iteration (of which there are at most 3)
|
|
2022-11-03 08:27:15
|
it's not per-channel, so this is not worse than dividing in the end to compute channel values
|
|
|
Traneptora
|
2022-11-03 08:28:46
|
how is it not per channel?
|
|
2022-11-03 08:29:00
|
how would you even do that "globally"
|
|
2022-11-03 08:30:06
|
> replaces each neighbour with the weighted avg
so it divides by 8 for everything except the edge cases, right?
|
|
2022-11-03 08:30:59
|
which is just a right shift
|
|
|
|
veluca
|
2022-11-03 08:40:09
|
no no, every pixel gets its own weight (a float in [0, 1])
|
|
2022-11-03 08:40:20
|
but the weight is per-pixel, not per-pixel-per-channel
|
|
|
Traneptora
|
2022-11-03 08:41:29
|
o, I'll have to read the spec
|
|
|
jox
|
2022-11-15 10:07:11
|
I just want to share a small test I did with cjxl. I personally can't see any difference between these two photos and the size reduction is impressive! Can anyone spot any visual differences? <:PepeGlasses:878298516965982308>
|
|
|
Traneptora
|
2022-11-15 10:12:06
|
d1 is really a sweet spot for jxl
|
|
|
_wb_
|
2022-11-15 10:12:58
|
I can see minor differences at 2x zoom, nothing problematic though at first sight
|
|
|
Traneptora
|
2022-11-15 10:13:52
|
butteraugli is tuned for standard viewing distance
|
|
2022-11-15 10:14:12
|
d1 will be mostly unnoticed at 1x zoom
|
|
|
jox
|
|
_wb_
I can see minor differences at 2x zoom, nothing problematic though at first sight
|
|
2022-11-15 10:21:31
|
Oh really, I can't even see any differences at 3x zoom. Maybe my eye is not trained to find the artifacts. What should I look for? I am genuinely curious to know what changes with different quality settings.
|
|
|
lonjil
|
2022-11-15 10:52:40
|
"zoom" doesn't actually tell you viewing distance. Someone sitting closer to a bigger screen will see more detail even at 1x zoom.
|
|
2022-11-15 10:55:21
|
at 100% zoom, I believe I see the original having more noise and the jxl one being slightly smoother.
|
|
2022-11-15 10:57:46
|
yes, the original definitely has some high frequency noise that becomes nearly imperceptible as you go further from the screen
|
|
|
monad
|
2022-11-16 12:12:29
|
"viewing distance measured in pixels"
|
|
|
w
|
2022-11-16 03:11:56
|
ssimulacra2 87.45555778
butteraugli 0.589679
|
|
|
Traneptora
|
2022-11-16 03:21:30
|
probably a bug
|
|
|
_wb_
|
2022-11-16 03:26:36
|
well those two images are slightly different:
```
$ compare -metric psnr orig.png dist.png null:
62.422
$ compare -verbose -metric pae orig.png dist.png null:
orig.png PNG 1920x1080 1920x1080+0+0 8-bit sRGB 80108B 0.020u 0:00.032
dist.png PNG 1920x1080 1920x1080+0+0 8-bit sRGB 341622B 0.030u 0:00.027
Image: orig.png
Channel distortion: PAE
red: 771 (0.0117647)
green: 514 (0.00784314)
blue: 771 (0.0117647)
all: 771 (0.0117647)
```
|
|
|
w
|
2022-11-16 03:27:35
|
i just wonder about the scores...
|
|
|
_wb_
|
2022-11-16 03:27:37
|
not in any visible way, but then again those ssimulacra2/butteraugli scores are in the "visually lossless" range too
|
|
2022-11-16 03:28:29
|
basically ssimulacra2 above 85 you can consider visually lossless, butteraugli maxnorm below 1 too
|
|
|
w
|
2022-11-16 03:29:04
|
it's +- 3 rgb, i would give it like a 100
|
|
|
DuxVitae
|
2022-11-17 11:06:06
|
I noticed that for the following image the file size for modular lossy and modular lossless is far from what I expected (lossy 6x larger than lossless):
`C:\Program Files\libjxl>cjxl C:\maze.png C:\maze_lossy.jxl -e 9 -m 1 -d 1
JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown]
Read 994x994 image, 12554 bytes, 394.7 MP/s
Encoding [Modular, d1.000, effort: 9],
Compressed to 24000 bytes (0.194 bpp).
994 x 994, 0.51 MP/s [0.51, 0.51], 1 reps, 6 threads.`
`C:\Program Files\libjxl>cjxl C:\maze.png C:\maze.jxl -e 9 -m 1 -d 0
JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown]
Read 994x994 image, 12554 bytes, 413.1 MP/s
Encoding [Modular, lossless, effort: 9],
Compressed to 3336 bytes (0.027 bpp).
994 x 994, 0.36 MP/s [0.36, 0.36], 1 reps, 6 threads.`
|
|
|
_wb_
|
2022-11-17 11:48:56
|
This is a typical image that is hard for dct and easy for lossless. We should add a heuristic to detect such cases and encode losslessly if that's smaller than lossy.
|
|
|
spider-mario
|
2022-11-17 01:32:01
|
thatโs lossy modular, though, not vardct, would we expect that there as well?
|
|
|
_wb_
|
2022-11-17 02:12:55
|
Lossy modular also uses xyb (no issue here) and a frequency transform that will turn this low entropy image into a higher entropy thing
|
|
|
Demiurge
|
2022-11-17 04:09:00
|
What kind of transform?
|
|
|
_wb_
|
2022-11-17 04:09:59
|
Squeeze, which is a reversible modified Haar transform
|
|
|
Demiurge
|
2022-11-17 04:10:00
|
This maze image looks very chaotic in an extremely orderly way
|
|
|
DZgas ะ
|
|
DuxVitae
I noticed that for the following image the file size for modular lossy and modular lossless is far from what I expected (lossy 6x larger than lossless):
`C:\Program Files\libjxl>cjxl C:\maze.png C:\maze_lossy.jxl -e 9 -m 1 -d 1
JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown]
Read 994x994 image, 12554 bytes, 394.7 MP/s
Encoding [Modular, d1.000, effort: 9],
Compressed to 24000 bytes (0.194 bpp).
994 x 994, 0.51 MP/s [0.51, 0.51], 1 reps, 6 threads.`
`C:\Program Files\libjxl>cjxl C:\maze.png C:\maze.jxl -e 9 -m 1 -d 0
JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown]
Read 994x994 image, 12554 bytes, 413.1 MP/s
Encoding [Modular, lossless, effort: 9],
Compressed to 3336 bytes (0.027 bpp).
994 x 994, 0.36 MP/s [0.36, 0.36], 1 reps, 6 threads.`
|
|
2022-11-17 09:37:31
|
Oh, very familiar. ||<:JXL:805850130203934781> ||
|
|
|
|
afed
|
2022-11-21 05:50:21
|
strange benchmarks <:PepeGlasses:878298516965982308>
https://www.lossless-benchmarks.com/
|
|
|
Jim
|
2022-11-21 05:53:13
|
Now do avif <:ReeCat:806087208678588437>
|
|
|
_wb_
|
2022-11-21 06:22:40
|
Strange that jxl is a single point. There is e1 to e9 in libjxl, and then there's fjxl with various effort settings too
|
|
|
DZgas ะ
|
|
afed
strange benchmarks <:PepeGlasses:878298516965982308>
https://www.lossless-benchmarks.com/
|
|
2022-11-21 08:59:16
|
>not encode params
>QOI best
> <:ReeCat:806087208678588437> <:ReeCat:806087208678588437> <:ReeCat:806087208678588437>
|
|
|
afed
strange benchmarks <:PepeGlasses:878298516965982308>
https://www.lossless-benchmarks.com/
|
|
2022-11-21 09:03:25
|
I can assume that this these measures the "Fact time" per "unit" of computing time, this is essentially *the same as the tests between x86 and ARM with "energy" costs for the same tasks (Excluding total intake and Do speed)*
|
|
|
|
afed
|
2022-11-22 12:27:55
|
```ZPNG "lossless" (the original, upstream ZPNG) is very competitive with and sometimes strictly better than QOIR. It occupies a similar spot in the design space to QOIR: simple implementation, reasonable compression ratio, very fast encode / decode speeds. "Sometimes strictly better" means that, on some photographic-heavy subsets of the image test suite (see the full benchmarks), ZPNG outperforms QOIR on all three columns (compression ratio, encoding speed and decoding speed) simultaneously. ZPNG is in some sense simpler than QOIR (ZPNG is around 700 lines of C++ code plus a zstd dependency) but in another sense more complicated (because of the zstd dependency)```
https://github.com/nigeltao/qoir
|
|
2022-11-22 12:28:13
|
```
QOIR_Lossless 1.000 RelCmpRatio 1.000 RelEncSpeed 1.000 RelDecSpeed (1)
JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2)
JXL_Lossless/l3 0.725 RelCmpRatio 0.032 RelEncSpeed 0.022 RelDecSpeed
JXL_Lossless/l7 0.613 RelCmpRatio 0.003 RelEncSpeed 0.017 RelDecSpeed
PNG/fpng 1.234 RelCmpRatio 1.138 RelEncSpeed 0.536 RelDecSpeed (1)
PNG/fpnge 1.108 RelCmpRatio 1.851 RelEncSpeed n/a RelDecSpeed (1)
PNG/libpng 0.960 RelCmpRatio 0.033 RelEncSpeed 0.203 RelDecSpeed
PNG/stb 1.354 RelCmpRatio 0.045 RelEncSpeed 0.186 RelDecSpeed (1)
PNG/wuffs 0.946 RelCmpRatio n/a RelEncSpeed 0.509 RelDecSpeed (1), (3)
QOI 1.118 RelCmpRatio 0.870 RelEncSpeed 0.700 RelDecSpeed (1)
WebP_Lossless 0.654 RelCmpRatio 0.015 RelEncSpeed 0.325 RelDecSpeed
ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (4)
ZPNG_NofilLossl 1.330 RelCmpRatio 0.843 RelEncSpeed 1.168 RelDecSpeed (4)```
|
|
2022-11-22 12:29:05
|
ZPNG as I understand it is basically PNG with some modified and added filters and compressed with Zstd
|
|
2022-11-22 01:28:27
|
so the biggest improvement comes from zstd, because it is highly optimized, making zpng faster even than ~~fpnge~~ fast jxl, while still having good compression?
but, there are some issues with this comparison because gcc is used and often with no extra flags for optimization, with clang all *jxl things are much faster, also if using native compilation for cpu
and for ~~fpnge~~ fast jxl with gcc and libgomp multithreading is not working correctly (at least for me) sometimes 2x slower (although this is a single-threaded comparison, but still)
|
|
|
|
veluca
|
2022-11-22 01:32:40
|
fpnge is ~2x faster than zpng ๐
|
|
|
|
afed
|
2022-11-22 01:42:08
|
in single-threaded mode?
and it looks like the latest zstd library is used in the comparison, not the one in the original repo
|
|
|
|
veluca
|
2022-11-22 01:43:30
|
I mean, in that table it's ~2x faster
|
|
2022-11-22 01:44:01
|
so single-threaded I'd say
|
|
|
|
afed
|
2022-11-22 01:46:23
|
ah, yeah, I meant fast_lossless jxl
|
|
|
|
veluca
|
2022-11-22 01:47:48
|
should still be faster tbh, probably depends on compilation options
|
|
2022-11-22 01:48:02
|
how did you compile?
|
|
|
|
afed
|
2022-11-22 01:51:39
|
```CXX="${CXX-g++}"``` or clang
with the default sh and with "-march=native -mtune=native"
|
|
2022-11-22 01:59:28
|
although there is also "-march=native", but still a gcc compiler
<https://github.com/nigeltao/qoir/blob/main/script/run_full_benchmarks.sh>
|
|
|
|
veluca
|
2022-11-22 02:01:22
|
there are special flags to enable intrinsics both in fjxl and in fpnge
|
|
2022-11-22 02:01:54
|
FASTLL_ENABLE_AVX2_INTRINSICS=1
|
|
2022-11-22 02:01:58
|
ok, yeah, that's set
|
|
|
|
afed
|
2022-11-22 02:03:25
|
but not for fpnge?
```echo 'Compiling out/jxl_adapter.o'
$CXX -c -march=native \
-DFASTLL_ENABLE_AVX2_INTRINSICS=1 \
-I../libjxl/build/lib/include \
-I../libjxl/lib/include \
$CXXFLAGS -Wno-unknown-pragmas adapter/jxl_adapter.cpp \
$LDFLAGS -o out/jxl_adapter.o
echo 'Compiling out/png_fpnge_adapter.o'
$CXX -c -march=native \
$CXXFLAGS adapter/png_fpnge_adapter.cpp \
$LDFLAGS -o out/png_fpnge_adapter.o```
|
|
|
|
veluca
|
2022-11-22 02:04:56
|
apparently for fpnge I made it unconditional-if-supported
|
|
|
|
afed
|
2022-11-22 02:09:02
|
looks like there was a speed bump, seems like the cause could be gcc (to achieve full real performance)?
<https://github.com/nigeltao/qoir/commit/c2fec8401776bb0bf58106f5315552b36aab4b08>
|
|
|
|
veluca
|
2022-11-22 02:11:26
|
could be
|
|
2022-11-22 02:11:31
|
also depends on the effort level
|
|
|
|
afed
|
2022-11-22 07:28:28
|
fpnge -r500 (first clang, second gcc, other flags unchanged)
```210.555 MP/s 16.822 bits/pixel
172.744 MP/s 16.822 bits/pixel
-
645.121 MP/s 0.429 bits/pixel
612.370 MP/s 0.429 bits/pixel
-
199.539 MP/s 18.010 bits/pixel
165.603 MP/s 18.010 bits/pixel```
|
|
2022-11-22 07:36:34
|
-r1000
```200.624 MP/s 17.370 bits/pixel
168.969 MP/s 17.370 bits/pixel
-
283.050 MP/s 9.981 bits/pixel
253.117 MP/s 9.981 bits/pixel```
|
|
|
|
veluca
|
2022-11-22 08:25:41
|
yup, I wrote it testing with clang so that's not entirely surprising
|
|
2022-11-22 08:26:49
|
although it's suprising that changing the # of repetitions would change the bpp...
|
|
|
_wb_
|
2022-11-22 08:27:40
|
Probably different images
|
|
|
|
afed
|
2022-11-22 08:28:30
|
yeah, these are different images, I just want to make sure it's consistently repeatable
|
|
|
_wb_
|
2022-11-22 08:30:20
|
I assume gcc is doing a bit less autovec or something? Or what's causing the difference?
|
|
|
|
afed
|
2022-11-22 08:32:47
|
and for fast_lossless_jxl there is something strange about threading with gcc (or libgomp vs libomp), it seems to run less threads than with clang
|
|
|
|
veluca
|
2022-11-22 08:39:35
|
could also be less optimal register allocator or the like, hard to say
|
|
|
|
afed
|
2022-11-23 12:49:33
|
`Add LZ4PNG to full benchmarks`
<https://github.com/nigeltao/qoir>
```QOIR_Lossless 1.000 RelCmpRatio 1.000 RelEncSpeed 1.000 RelDecSpeed (1)
JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2)
JXL_Lossless/l3 0.725 RelCmpRatio 0.032 RelEncSpeed 0.022 RelDecSpeed
JXL_Lossless/l7 0.613 RelCmpRatio 0.003 RelEncSpeed 0.017 RelDecSpeed
LZ4PNG_Lossless 1.403 RelCmpRatio 1.038 RelEncSpeed 1.300 RelDecSpeed (3)
LZ4PNG_NofilLsl 1.642 RelCmpRatio 1.312 RelEncSpeed 2.286 RelDecSpeed (3)
PNG/fpng 1.234 RelCmpRatio 1.138 RelEncSpeed 0.536 RelDecSpeed (1)
PNG/fpnge 1.108 RelCmpRatio 1.851 RelEncSpeed n/a RelDecSpeed (1)
PNG/libpng 0.960 RelCmpRatio 0.033 RelEncSpeed 0.203 RelDecSpeed
PNG/stb 1.354 RelCmpRatio 0.045 RelEncSpeed 0.186 RelDecSpeed (1)
PNG/wuffs 0.946 RelCmpRatio n/a RelEncSpeed 0.509 RelDecSpeed (1), (4)
QOI 1.118 RelCmpRatio 0.870 RelEncSpeed 0.700 RelDecSpeed (1)
WebP_Lossless 0.654 RelCmpRatio 0.015 RelEncSpeed 0.325 RelDecSpeed
ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (3)
ZPNG_NofilLsl 1.330 RelCmpRatio 0.843 RelEncSpeed 1.168 RelDecSpeed (3)```
yeah, zstd seems to do the main job, lz4 is faster, but compression suffers a lot
|
|
|
_wb_
|
2022-11-23 01:24:42
|
you could also try png.br, which is something you could actually use on the web right now (i.e. a png that only uses the uncompressed fallback mode of DEFLATE, and then gets sent with brotli transfer-encoding)
|
|
|
|
afed
|
2022-11-23 01:46:56
|
yeah, but it has limited support (transfer mostly) and difficult to use
and zpng has faster and more effective filtering
<https://github.com/catid/Zpng>
```This library is similar to PNG in that the image is first filtered, and then submitted to a data compressor. The filtering step is a bit simpler and faster but somehow more effective than the one used in PNG. The data compressor used is Zstd, which makes it significantly faster than PNG to compress and decompress.
Filtering:
(1) Reversible color channel transformation. (2) Split each color channel into a separate color plane. (3) Subtract each color value from the one to its left.```
though, if fpnge had a filter-only mode, that would be usable for some mixes, even with brotli
or fast jxl which would work like zpng but use brotli instead of zstd, or even gpu brotli <:FeelsAmazingMan:808826295768449054>
|
|
|
_wb_
|
2022-11-23 01:56:53
|
That Zpng filtering can be represented in jxl's modular: it's just a particular fixed RCT (jxl has many, and the Zpng one is probably not the most effective one) and using W as a fixed predictor. In modular jxl, things are always planar.
|
|
|
|
afed
|
2022-11-23 02:19:06
|
i mean, if properly picked from something simple, it can also be very fast and with enough efficiency, like here:
```JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2)
ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (3)```
though it may only be on this test set and also using clang may change the results
|
|
|
afed
yeah, but it has limited support (transfer mostly) and difficult to use
and zpng has faster and more effective filtering
<https://github.com/catid/Zpng>
```This library is similar to PNG in that the image is first filtered, and then submitted to a data compressor. The filtering step is a bit simpler and faster but somehow more effective than the one used in PNG. The data compressor used is Zstd, which makes it significantly faster than PNG to compress and decompress.
Filtering:
(1) Reversible color channel transformation. (2) Split each color channel into a separate color plane. (3) Subtract each color value from the one to its left.```
though, if fpnge had a filter-only mode, that would be usable for some mixes, even with brotli
or fast jxl which would work like zpng but use brotli instead of zstd, or even gpu brotli <:FeelsAmazingMan:808826295768449054>
|
|
2022-11-23 02:36:45
|
however, just filtering without knowing what compression method will be used afterwards is probably not a very optimal way
|
|
|
|
veluca
|
2022-11-23 03:24:47
|
I don't find fast lossless en/decoders for a new format to be that interesting, it's not *that* hard to make something that is very fast and very dense if you don't have constraints on the bitstream
|
|
2022-11-23 03:26:05
|
(and still fits in a couple thousand lines without external libs, say)
|
|
|
|
afed
|
2022-11-23 03:45:38
|
yeah, highly specialized formats, especially those that don't scale well to anything else are interesting maybe just for research or experiments
i meant that among the existing ones, other approaches can also be made without breaking the specs (which has already been done, but I refer to even different ways)
|
|
2022-11-23 04:17:09
|
<https://github.com/nigeltao/qoir>
sad that run_benchmarks.sh does not compile in my environment (I am not on linux), maybe someone can test this but with clang instead of gcc?
(or gcc vs clang, so we can compare compiler impact)
<https://github.com/nigeltao/qoir/blob/main/run_benchmarks.sh#L3>
|
|
2022-11-23 08:15:08
|
found something (and even clang 14 is worse for some reason)
<https://www.phoronix.com/review/aocc4-gcc-clang/3>
|
|
2022-11-23 08:15:46
|
|
|
2022-11-23 08:16:20
|
|
|
|
BlueSwordM
|
|
afed
found something (and even clang 14 is worse for some reason)
<https://www.phoronix.com/review/aocc4-gcc-clang/3>
|
|
2022-11-23 08:20:18
|
We've talked about this in the past, but I still haven't managed to do any profiling since I can't seem to install Clang 14 on my system.
|
|
|
|
afed
|
2022-11-23 08:22:24
|
though maybe because this is a very new cpu
|
|
|
BlueSwordM
|
|
afed
though maybe because this is a very new cpu
|
|
2022-11-23 08:30:16
|
I don't think that's why.
I also myself noticed a massive speed increase when I rebuilt libjxl when my distro updated from Clang 14 to Clang 15.
|
|
|
|
veluca
|
2022-11-23 10:13:34
|
those numbers are weird...
|
|
|
pshufb
|
2022-11-24 06:04:25
|
I wouldn't be surprised if the difference went away with PGO
|
|
|
|
afed
|
2022-11-27 01:34:26
|
<:DogWhat:806133035786829875>
|
|
2022-11-27 01:36:29
|
|
|
|
|
veluca
|
2022-11-27 01:41:11
|
what's so surprising?
|
|
|
|
afed
|
2022-11-27 01:42:36
|
clang vs gcc performance (it's all single-threaded, same conditions, compiler is the only difference)
|
|
|
|
veluca
|
2022-11-27 01:46:05
|
I'm not too surprised ๐
|
|
|
|
afed
|
2022-11-27 01:46:47
|
i expected some difference, but not that much
|
|
|
yurume
|
2022-11-27 02:12:14
|
within my expectation, too
|
|
2022-11-27 02:12:33
|
autovectorization and low-level construct is very sensitive to compilers
|
|
|
TheBigBadBoy - ๐ธ๐
|
2022-11-28 04:30:55
|
I did not know that clang produces faster binaries ๐ฎ
What were the compilation flags pls ?
Also, could be the default compilation flags not beeing the same between gcc and clang ?
|
|
|
diskorduser
|
|
afed
<:DogWhat:806133035786829875>
|
|
2022-11-28 05:53:41
|
Does profiled compilation improved encoding speed?
|
|
|
|
afed
|
2022-11-28 11:58:22
|
perhaps, but I don't think pgo will be commonly used in benchmarks because it requires some preparation to compile (it's not just adding an extra key)
|
|
|
Jim
|
|
TheBigBadBoy - ๐ธ๐
I did not know that clang produces faster binaries ๐ฎ
What were the compilation flags pls ?
Also, could be the default compilation flags not beeing the same between gcc and clang ?
|
|
2022-11-29 11:40:50
|
Clang probably focuses more on optimizations which generally take more time to encode (clang is usually slower at encoding), but produces faster-running code. It's not likely due to flags, just how much effort they put into detecting common code and optimizing the output.
|
|
|
yurume
|
2022-11-29 11:46:56
|
I've personally seen cases where GCC produces a better code than clang as well, I agree clang is trying to be more aggressive in general but individual cases can vary much.
|
|
|
sklwmp
|
|
Jim
Clang probably focuses more on optimizations which generally take more time to encode (clang is usually slower at encoding), but produces faster-running code. It's not likely due to flags, just how much effort they put into detecting common code and optimizing the output.
|
|
2022-11-29 02:32:49
|
I always heard that Clang was faster at encoding than GCC, but produced slower binaries. I guess things have changed in the meantime?
|
|
|
Fraetor
|
2022-12-01 09:08:03
|
For our heavy compute stuff at work we either use GCC or the Cray compiler. TBF, that might be because we have a lot of Fortran, which I don't know how well LLVM deals with.
|
|
|
|
afed
|
2022-12-01 09:15:54
|
8-bit encoding will be slower after this changes?
`8 bit paths are not SIMDfied yet`
<https://github.com/libjxl/libjxl/pull/1938>
|
|
|
_wb_
|
2022-12-01 09:16:58
|
no
|
|
2022-12-01 09:18:13
|
yeah that's > 8 bit
|
|
2022-12-01 09:18:31
|
which in markdown renders like a quote ๐
|
|
2022-12-01 09:18:36
|
`> 8 bit`
|
|
2022-12-01 09:19:49
|
8-12 bit will be same as before, 13-14 is as fast as 9-12, and 15-16 will be a bit slower
|
|
|
|
afed
|
2022-12-01 09:29:33
|
ah, I see, good, I wonder the simplest animation (like just various images in one, without inter-frame optimization) is easier to add in fpnge or fjxl (or it is not so easy to implement)?
sometimes i need something very fast like this
|
|
|
|
veluca
|
2022-12-01 11:51:48
|
shouldn't be too hard
|
|
2022-12-01 11:53:46
|
will I do that? who knows ๐
|
|
|
|
afed
|
2022-12-05 05:26:18
|
`Add WebP_Lossy2 to full benchmarks`
<https://github.com/nigeltao/qoir/commit/5671f584dcf84ddb71e28da6fa60225abe915e43>
a very strange lossy modification <:WTF:805391680538148936>
```WebP_Lossy 0.084 RelCmpRatio 0.065 RelEncSpeed 0.453 RelDecSpeed
WebP_Lossy2 0.443 RelCmpRatio 0.015 RelEncSpeed 0.435 RelDecSpeed```
`(5), the Lossy2 suffix, means that the images are encoded losslessly (even though e.g. WebP does have its own lossy format) but after applying QOIR's lossiness=2 quantization, reducing each pixel from 8 to 6 bits per channel.`
|
|
|
Traneptora
|
2022-12-05 05:27:30
|
keep in mind that webp has a lossless-prefilter option
|
|
2022-12-05 05:27:38
|
where images are prefiltered and then compressed losslessly
|
|
|
Jyrki Alakuijala
|
2022-12-06 12:52:34
|
near-lossless webp is nice
|
|
|
daniilmaks
|
|
Traneptora
where images are prefiltered and then compressed losslessly
|
|
2022-12-07 12:06:07
|
so like lossyWAV?
|
|
|
Traneptora
|
2022-12-07 03:40:40
|
idk what that is
|
|
|
|
Quikee
|
2022-12-07 04:55:50
|
it lossily filters a wav audio file so it can be compressed better with a lossless compressor like FLAC
|
|
|
daniilmaks
|
2022-12-07 05:23:22
|
its for various lossless audio algorithms, including flac, yes.
|
|
|
Demiurge
|
2022-12-09 05:43:39
|
Yeah, that's very cool. And the exact same idea can be applied to images in the same way.
|
|
2022-12-09 05:45:20
|
Removing the least significant bits, dithering the result using a noise shaping algorithm, doing it based on the psychoperceptual idea of activity masking, all that applies to images just as much as it applies to sound
|
|
|
|
afed
|
2022-12-15 03:55:22
|
inf MP/s <:monkaMega:809252622900789269>
`2289 x 1288, geomean: inf MP/s [305.81, 413.46], 200 reps, 0 threads`
|
|
|
_wb_
|
2022-12-15 03:57:41
|
Lol we may need to do that computation with doubles instead of floats, I suppose ๐
|
|
|
|
veluca
|
2022-12-15 04:01:47
|
or taking some logs first
|
|
|
|
afed
|
2022-12-15 04:02:27
|
```Single-threaded:
fjxl 0 gcc 12.2.0
69.384 MP/s
9.987 bits/pixel
fjxl 0 clang 15.0.5
141.867 MP/s
9.987 bits/pixel
fjxl 1 gcc 12.2.0
70.452 MP/s
9.381 bits/pixel
fjxl 1 clang 15.0.5
146.944 MP/s
9.381 bits/pixel
fjxl 2 gcc 12.2.0
69.940 MP/s
9.381 bits/pixel
fjxl 2 clang 15.0.5
147.396 MP/s
9.381 bits/pixel
cjxl 1 clang 15.0.5
3840 x 2160, geomean: 161.78 MP/s [96.17, 169.82], 200 reps, 0 threads.```
|
|
|
|
veluca
|
2022-12-15 04:03:20
|
seems about right
|
|
|
|
afed
|
2022-12-15 04:08:28
|
1=1 thread for fjxl?
and min/max MP/s as in cjxl would be useful
|
|
|
|
veluca
|
2022-12-15 04:12:06
|
feel free to send a PR ๐
|
|
2022-12-15 04:12:12
|
should be easy
|
|
|
|
afed
|
2022-12-15 04:14:01
|
though using fast_lossless in cjxl is easier anyway
8 threads
```fjxl 0 gcc 12.2.0
258.911 MP/s
9.987 bits/pixel
fjxl 0 clang 15.0.5
520.788 MP/s
9.987 bits/pixel
fjxl 1 gcc 12.2.0
258.484 MP/s
9.381 bits/pixel
fjxl 1 clang 15.0.5
516.224 MP/s
9.381 bits/pixel
fjxl 2 gcc 12.2.0
250.453 MP/s
9.381 bits/pixel
fjxl 2 clang 15.0.5
507.487 MP/s
9.381 bits/pixel
cjxl 1 clang 15.0.5
3840 x 2160, geomean: inf MP/s [380.74, 687.32], 200 reps, 8 threads.```
|
|
2022-12-15 04:22:20
|
animation also works
|
|
|
|
veluca
|
2022-12-15 04:25:39
|
really?
|
|
2022-12-15 04:25:51
|
does it also decode to multiple frames? xD
|
|
|
|
afed
|
2022-12-15 04:31:55
|
It seems so
|
|
|
|
veluca
|
2022-12-15 04:35:09
|
consider me very surprised
|
|
|
|
afed
|
2022-12-15 04:43:00
|
but animated png and webp not working (only gif), but it seems to be a cjxl issue
though, png with alpha works at higher efforts
|
|
2022-12-15 04:45:35
|
for pngs (without alpha also) there is no errors and no output
|
|
|
_wb_
|
2022-12-15 04:49:15
|
took me a while to understand that the above is an animated png
|
|
2022-12-15 04:51:50
|
"Graceful degradation" is kind of annoying
|
|
|
|
afed
|
2022-12-15 04:55:40
|
yeah, discord doesn't support webp and png animation
but, I did not find a gif that did not work, something related to the number of colors?
|
|
|
_wb_
|
2022-12-15 04:59:01
|
hm, I get a segfault
|
|
2022-12-15 04:59:38
|
hread 1 "cjxl" received signal SIGSEGV, Segmentation fault.
JxlEncoderStruct::RefillOutputByteQueue (this=this@entry=0x5555555e4ca0) at /home/jon/dev/libjxl/lib/jxl/encode.cc:492
492 duration = input_frame->option_values.header.duration;
|
|
|
|
veluca
|
2022-12-15 09:18:14
|
I am absolutely not surprised
|
|
|
sklwmp
|
2022-12-17 05:04:20
|
The AVIF team's new data is weird...
|
|
|
BlueSwordM
|
|
sklwmp
The AVIF team's new data is weird...
|
|
2022-12-17 05:22:19
|
Not weird.
Straight up wrong <:YEP:808828808127971399>
|
|
2022-12-17 05:32:14
|
<@557099078337560596> <@179701849576833024> They're playing dirty:
`MediaTek MT8173C `
They know what they're doing, especially when using an old version of libjxl inside of Chrome <:YEP:808828808127971399>
|
|
|
_wb_
|
2022-12-17 08:09:50
|
It's also kind of unfair to only look at the fastest possible avif files (8-bit 4:2:0 with an rgb colorspace that has a specialized path) while ignoring the more general case (say 10-bit 4:4:4 with a non-specialized colorspace).
|
|
|
|
veluca
|
2022-12-17 08:32:46
|
I assume that's a CPU with not too good float decoding?
|
|
|
_wb_
|
2022-12-17 08:47:02
|
The data below was run on an ASUS Chromebook C202XA with a Mediatek MT8173c, running Chrome Version 110.0.5447.0 (Official Build) dev (32-bit).
|
|
2022-12-17 08:48:23
|
Why does it say 32-bit if that cpu is supposed to be arm64?
|
|
2022-12-17 08:49:10
|
I suppose that means they're using an armv7 build?
|
|
|
|
veluca
|
2022-12-17 08:50:18
|
Absolutely possible
|
|
2022-12-17 08:50:30
|
Most android phones actually stick to armv7
|
|
2022-12-17 08:50:52
|
In a couple of years it may change
|
|
|
sklwmp
|
2022-12-17 09:34:47
|
Yeah, especially with Google going 64-bit only for the Pixel 7.
|
|
|
|
veluca
|
2022-12-17 10:19:30
|
and I think we haven't benchmarked armv7 decoding performance in a *long* time (if ever)
|
|
2022-12-17 10:19:53
|
I can 100% believe it being a lot slower than aarch64 on the same CPU
|
|
|
_wb_
|
2022-12-17 10:34:08
|
Why would you use a 32-bit build of Chrome on a 64-bit Chromebook? That just seems like a bad idea. Am I missing something?
|
|
2022-12-17 10:55:56
|
I really wonder what is going on with those decode speeds at https://sneyers.info/browserspeedtest/index2.html
|
|
2022-12-17 10:56:24
|
can someone report the numbers they're getting? on my phone, 4:4:4 avif makes it crash chrome
|
|
2022-12-17 10:56:36
|
on my laptop, I get these numbers:
011.jxl: Decode speed: 41.72 MP/s | Fetch: 223.90ms | 100 decodes: 785.40ms
011-fd1.jxl: Decode speed: 45.05 MP/s | Fetch: 180.90ms | 100 decodes: 727.40ms
011-fd2.jxl: Decode speed: 47.10 MP/s | Fetch: 190.90ms | 100 decodes: 695.70ms
011-fd3.jxl: Decode speed: 48.77 MP/s | Fetch: 184.90ms | 100 decodes: 671.90ms
011-8bit420.avif: Decode speed: 86.30 MP/s | Fetch: 200.00ms | 100 decodes: 379.70ms
011-8bit444.avif: Decode speed: 2.91 MP/s | Fetch: 209.00ms | 100 decodes: 11261.10ms
011-10bit.avif: Decode speed: 2.86 MP/s | Fetch: 208.10ms | 100 decodes: 11455.70ms
011-12bit.avif: Decode speed: 2.85 MP/s | Fetch: 215.00ms | 100 decodes: 11507.10ms
011.webp: Decode speed: 122.09 MP/s | Fetch: 186.30ms | 100 decodes: 268.40ms
011.jpg: Decode speed: 112.07 MP/s | Fetch: 196.50ms | 100 decodes: 292.40ms
|
|
2022-12-17 11:01:09
|
The images are just sRGB โ in the previous test I did the images happened to have a Rec709 ICC color profile so that may have caused much of the decode time to have been spent in skcms or something, I dunno. I figured using just sRGB would be fairer, but now 8-bit 420 avif looks much better and any other avif looks MUCH worse.
|
|
2022-12-17 11:02:11
|
Can someone else reproduce that huge gap between 4:2:0 avif and 4:4:4 avif?
|
|
|
Eugene Vert
|
|
_wb_
I really wonder what is going on with those decode speeds at https://sneyers.info/browserspeedtest/index2.html
|
|
2022-12-17 11:02:57
|
I'm getting `Uncaught (in promise) DOMException: An attempt was made to use an object that is not, or is no longer, usable` on jxl buttons in firefox-dev
|
|
|
_wb_
|
2022-12-17 11:03:08
|
this looks like a performance bug in the chrome integration of avif, tbh
|
|
|
Eugene Vert
I'm getting `Uncaught (in promise) DOMException: An attempt was made to use an object that is not, or is no longer, usable` on jxl buttons in firefox-dev
|
|
2022-12-17 11:03:46
|
are you using a browser that can decode jxl? the current chrome canary no longer can, unfortunately
|
|
|
Eugene Vert
|
2022-12-17 11:04:38
|
Ah, thats non-wasm version
|
|
|
|
paperboyo
|
2022-12-17 11:04:38
|
Pixel 6 Pro, Chrome Beta 109.0.5414,44:
```
011.jpg: Decode speed: 61.65 MP/s | Fetch: 1048.90ms | 100 decodes: 531.50ms
011.jpg.jxl: Decode speed: 27.12 MP/s | Fetch: 1238.30ms | 100 decodes: 1208.40ms
011.jxl: Decode speed: 15.12 MP/s | Fetch: 19.10ms | 100 decodes: 2166.70ms
011-fd1.jxl: Decode speed: 15.43 MP/s | Fetch: 1224.50ms | 100 decodes: 2124.00ms
011-fd2.jxl: Decode speed: 17.45 MP/s | Fetch: 1898.20ms | 100 decodes: 1878.20ms
011-fd3.jxl: Decode speed: 19.10 MP/s | Fetch: 2004.30ms | 100 decodes: 1715.70ms
011-8bit420.avif: Decode speed: 38.04 MP/s | Fetch: 15.20ms | 100 decodes: 861.40ms
011-8bit444.avif: Decode speed: 1.83 MP/s | Fetch: 2404.60ms | 100 decodes: 17917.80ms
011-10bit.avif: Decode speed: 1.80 MP/s | Fetch: 2536.10ms | 100 decodes: 18231.50ms
011-12bit.avif: Decode speed: 1.79 MP/s | Fetch: 2573.20ms | 100 decodes: 18338.50ms
011.webp: Decode speed: 56.22 MP/s | Fetch: 2970.80ms | 100 decodes: 582.90ms
```
|
|
|
_wb_
|
2022-12-17 11:06:29
|
ouch, with those number basically avif is in practice 4:2:0 only, that's a huge performance difference
|
|
2022-12-17 11:07:19
|
i'm getting a webp dรฉjร vu
|
|
|
|
veluca
|
|
_wb_
Why would you use a 32-bit build of Chrome on a 64-bit Chromebook? That just seems like a bad idea. Am I missing something?
|
|
2022-12-17 11:07:41
|
memory usage I guess? IDK
|
|
|
paperboyo
Pixel 6 Pro, Chrome Beta 109.0.5414,44:
```
011.jpg: Decode speed: 61.65 MP/s | Fetch: 1048.90ms | 100 decodes: 531.50ms
011.jpg.jxl: Decode speed: 27.12 MP/s | Fetch: 1238.30ms | 100 decodes: 1208.40ms
011.jxl: Decode speed: 15.12 MP/s | Fetch: 19.10ms | 100 decodes: 2166.70ms
011-fd1.jxl: Decode speed: 15.43 MP/s | Fetch: 1224.50ms | 100 decodes: 2124.00ms
011-fd2.jxl: Decode speed: 17.45 MP/s | Fetch: 1898.20ms | 100 decodes: 1878.20ms
011-fd3.jxl: Decode speed: 19.10 MP/s | Fetch: 2004.30ms | 100 decodes: 1715.70ms
011-8bit420.avif: Decode speed: 38.04 MP/s | Fetch: 15.20ms | 100 decodes: 861.40ms
011-8bit444.avif: Decode speed: 1.83 MP/s | Fetch: 2404.60ms | 100 decodes: 17917.80ms
011-10bit.avif: Decode speed: 1.80 MP/s | Fetch: 2536.10ms | 100 decodes: 18231.50ms
011-12bit.avif: Decode speed: 1.79 MP/s | Fetch: 2573.20ms | 100 decodes: 18338.50ms
011.webp: Decode speed: 56.22 MP/s | Fetch: 2970.80ms | 100 decodes: 582.90ms
```
|
|
2022-12-17 11:07:55
|
that seems like a bug
|
|
|
_wb_
|
2022-12-17 11:07:55
|
this must be a bug though, no way there is an inherent decode speed difference that large
|
|
|
|
veluca
|
2022-12-17 11:08:18
|
that, or somebody didn't bother SIMDfying the 444 path yet, only other explanation I can think of
|
|
|
_wb_
|
2022-12-17 11:08:49
|
well it's what happens in current chrome stable and in chrome canary on both x64 and arm64, it seems
|
|
|
|
veluca
|
2022-12-17 11:08:53
|
but on the Samsung A51 I didn't see such a bif difference
|
|
|
_wb_
|
2022-12-17 11:09:09
|
I also didn't see that big of a difference on a different image
|
|
2022-12-17 11:09:21
|
maybe it's because this one doesn't have an icc profile?
|
|
|
|
veluca
|
2022-12-17 11:09:54
|
mhhh it also seems to happen on my laptop
|
|
2022-12-17 11:09:58
|
this is *odd*
|
|
2022-12-17 11:10:53
|
I feel like I should be taking a profile to figure out what the heck is happening
|
|
|
_wb_
|
2022-12-17 11:11:50
|
just using avifdec I don't see much of a difference in decode speed (444 just slightly slower, as expected), so this must be something specific to the chrome integration
|
|
2022-12-17 11:14:26
|
having different code paths for different bit depths / chroma subsampling modes in the chrome integration is error-prone, and this is another illustration of "we get it for free since we have av1 anyway" not being fully true...
|
|
|
Sauerstoffdioxid
|
2022-12-17 11:15:51
|
On a different note, I tried getting some benchmark numbers by throwing 500 `img` elements onto a page (resized to 1x1 so they actually all display and decode at the same time). Current Chromium stable:
s.jpg: Fetching: 3.70 | Generating: 274.70 | Loading: 479.30
s.webp: Fetching: 4.90 | Generating: 285.40 | Loading: 885.20
s.jxl: Fetching: 4.90 | Generating: 270.40 | Loading: 3340.90
s8.avif: Fetching: 4.50 | Generating: 265.20 | Loading: 2261.60
s10.avif: Fetching: 5.10 | Generating: 316.90 | Loading: 3500.40
s12.avif: Fetching: 3.70 | Generating: 277.70 | Loading: 3349.70
so basically, as long as you don't force single threading or the ImageDecoder API, JXL performs just as well as AVIF.
|
|
|
_wb_
|
2022-12-17 11:19:15
|
does it decode the image 500 times if it's always the same image? that looks like a missed opportunity to optimize, since pages where the same image is being used multiple times are probably not that rare...
|
|
|
Sauerstoffdioxid
|
2022-12-17 11:23:37
|
Yeah, normally it caches, but I worked around that for this test
|
|
|
HLBG007
|
2022-12-17 11:32:43
|
Hello one graphic in your article is wrong. Here my version
|
|
|
|
jjido
|
2022-12-17 02:06:06
|
Trolling?
|
|
|
_wb_
|
2022-12-17 02:48:38
|
I wish avifdec had an option like djxl's --num_reps. Then we could do some more accurate timing of the actual decode time, and not just how optimized the current chrome integration is
|
|
|
Demiurge
|
2022-12-18 12:18:46
|
Hmm, for some reason putting a bunch of JPEGs in a .7z archive crushes them farther than JXL recompression?
|
|
2022-12-18 12:19:09
|
As long as there's not just 1
|
|
2022-12-18 12:19:38
|
I didn't realize LZMA can crush JPEGs so effectively
|
|
2022-12-18 12:20:06
|
That's kinda funny that such a generic method is more effective than a specialized method
|
|
|
|
ayumi
|
2022-12-18 12:22:01
|
If you are using "solid" compression it will compress multiple files as one "block", which could explain why you need more than one file to get this effect.
|
|
|
Demiurge
|
2022-12-18 12:22:06
|
I'm curious if anyone else has had a similar experience
|
|