JPEG XL

2022-10-12 10:07:18	Kornel and Jon: what is your thinking about why to divide by pixel sample means (+C1) in SSIM ?
2022-10-12 10:08:36	in https://en.wikipedia.org/wiki/Structural_similarity
2022-10-12 10:09:13	why variance differences should be related to intensity differences if we are already in gamma compressed or otherwise linear psychovisual experience space
2022-10-12 10:10:39	I cannot think of a neurophysiological process that would do anything like that in the retina, I consider it one big mistake in SSIM

fab

	Jyrki Alakuijala can you show an example of the improvement
2022-10-12 10:11:11	Ok

Jyrki Alakuijala

2022-10-12 10:11:45

it can be 'fixed' by having arbitrarily large C1 to remove the effect of that division, but it would be more honest to just not call it SSIM at that stage

fab

2022-10-12 10:11:54
2022-10-12 10:12:17	New has lower bpp
2022-10-12 10:12:20	Bitrate
2022-10-12 10:12:33	But is way way better
2022-10-12 10:12:43	On par with october 2021 (obviously new heuristics at high distance at thet time
2022-10-12 10:12:56	This is the best improvements
2022-10-12 10:13:07	Is 0710jxl vs 1010jxl

Jyrki Alakuijala

2022-10-12 10:14:21	did we get worse since oct 2021?
2022-10-12 10:14:31	I hope we removed new heuristics

fab

	Jyrki Alakuijala did we get worse since oct 2021?
2022-10-12 10:14:39	A bit on this particular image
2022-10-12 10:14:54	This 1010 improve the artifacts
2022-10-12 10:15:06	As you can see there are way less
2022-10-12 10:15:15	On new.jxl.png

Jyrki Alakuijala

2022-10-12 10:15:17

that image looks like it is not high quality to start with -- looks like it was video compressed -- I don't think we should use this kind of test material to guide decisions

fab

2022-10-12 10:15:26

I know

Jyrki Alakuijala

2022-10-12 10:15:32

let's use photographs, not video

fab

2022-10-12 10:15:43	But it looks like boring
2022-10-12 10:15:56	Like new 1010 has less artifacts less bpp
2022-10-12 10:16:05	But is too boring to see

Jyrki Alakuijala

2022-10-12 10:16:43

can you give a zoom where it is boring?

fab

2022-10-12 10:16:52

All Image

Jyrki Alakuijala

2022-10-12 10:16:55

I don't know how to quantify boring

fab

2022-10-12 10:17:01	Is not entenyearing
2022-10-12 10:17:05	Enteartaing
2022-10-12 10:17:12	Good greens as i said
2022-10-12 10:17:16	Good visually

Jyrki Alakuijala

2022-10-12 10:17:17

I agree that the image looks boring, but it is because of its blurriness in the 'original'

fab

2022-10-12 10:17:22	But bad mentally
2022-10-12 10:17:45	Hope you don't aim for strong compression

Jyrki Alakuijala

2022-10-12 10:18:04

what if you add photon noise, will it be better mentally?

fab

2022-10-12 10:18:47	I don't know
2022-10-12 10:19:01	I don't want to use noise
2022-10-12 10:20:32	Jxl is becoming the rav1e of webp
2022-10-12 10:20:41	Improving in quality
2022-10-12 10:20:53	But the red get less appealing
2022-10-12 10:21:02	Channel
2022-10-12 10:24:04	Maybe you should add something
2022-10-12 10:26:19	What jon is doing today is great
2022-10-12 10:40:30	I agree with jpeg xl there is the possibility to deband but it has to be avoided

Jyrki Alakuijala

	fab I don't want to use noise
2022-10-12 10:54:47	why not

fab

2022-10-12 10:55:10

Nothing

_wb_

	Jyrki Alakuijala Kornel and Jon: what is your thinking about why to divide by pixel sample means (+C1) in SSIM ?
2022-10-12 11:25:40	it's `2 * avgref * avgdist / (avgref^2 + avgdist^2)`, which you can rewrite to this: `1 - ((avgref - avgdist)^2 / (avgref^2 + avgdist^2))` or in other words the error is basically the squared difference divided by something close to twice the squared intensity. That is indeed strange, since if the intensities are already psychovisually linear, there is no reason to weigh the darks more than the brights.
2022-10-12 11:26:17	Well I guess I can just change that formula to drop the division and see if it works better after tuning
2022-10-12 11:33:35	I will have an answer tomorrow or so 🙂

fab

2022-10-12 11:43:55
2022-10-12 11:44:05	Wb looks this txt

_wb_

2022-10-12 11:44:52

Checking what the original SSIM paper has to say about this division by intensity, it looks like they indeed made a braino there...

spider-mario

2022-10-12 11:47:22	so the whole SSIM edifice is built on lies?
2022-10-12 11:47:38	(I’m being dramatic for effect)

_wb_

2022-10-12 11:48:06	Weber's law just means that you need a nonlinear transfer function to be perceptually uniform, but they are basically applying gamma correction twice: once because you use SSIM typically on nonlinear sample values, and then again by doing that division. Basically it means they use an effective transfer curve that is a lot steeper than the ~gamma 2.4 of sRGB, something more like gamma 4.
2022-10-12 11:48:43	Being steeper than sRGB is probably a good thing (XYB also does that), which is probably why this went unnoticed
2022-10-12 12:17:33	it would be nice if that part of the formula can just be dropped while getting as good or better results — it would make it slightly cheaper to compute
2022-10-12 12:20:18	conceptually, that division is dubious for intensities (it looks like they made a mistake and are effectively doing gamma correction twice), but it's even more dubious when applying SSIM to chroma channels, like in SSIMULACRA (both 1 and 2) and I guess also in DSSIM, right <@826537092669767691> ?

Kornel

2022-10-12 12:21:36

Yes, it's dubious

_wb_

2022-10-12 12:22:10

since it basically means that it is more sensitive to errors in the greens and blues (low values of a and b) and less sensitive to errors in the reds and yellows (high values of a and b), and in case of ssimulacra2 the same but for whatever low/high X and B means

Kornel

2022-10-12 12:22:27	I've noticed that error before. I thought I've fixed it in DSSIM (I even put that in the readme)
2022-10-12 12:22:42	but recently I've reviewed the code and… I'm not sure if it's right or not 🙂

_wb_

2022-10-12 12:27:25	you're scaling down in linear space which is of course correct but I wonder how big a difference it makes compared to scaling down in XYB. Of course the downscaled images will get too dark since Y uses gamma 3, but that happens to both original and distorted and I would assume it can be compensated in the weighing of the different scales...
2022-10-12 12:29:15	in other words I'm wondering if I should downscale in linear RGB too and convert each scale to XYB instead of doing the XYB conversion only once (which is a bit faster but I don't care that much about speed if it makes a significant difference)

Kornel

2022-10-12 12:31:10	I assume if you want to detect this error - encoders using wrong gamma, then you should avoid repeating it 🙂
2022-10-12 12:32:40	I've used to downscale in Lab, but that made DSSIM insensitive to chroma subsampling

_wb_

2022-10-12 12:38:31	I need to think a bit more about this.
2022-10-12 12:44:00	There are two main kinds of chroma subsampling artifacts in my experience: - at the 1:1 scale, chroma obviously gets blurred causing small details to get lost if they're mostly in the chroma; - textured reds and blues get duller since e.g. red-black-red-black 1-pixel stripes have Cr values that alternate between 127 and 0 and subsampling will turn it into a uniform 63, making the red darker and desaturated.
2022-10-12 12:46:46	the first one will be most noticeable in the 1:1 chroma channels themselves, not really in the zoomed-out scales. the second one does remain visible in the zoomed-out scales and also causes luma artifacts (at least in the L of Lab or the Y of XYB).
2022-10-12 12:50:32	<@826537092669767691> "a/b channels of Lab are compared with lower spatial precision to simulate eyes' higher sensitivity to brightness than color changes." -> wouldn't that be an obstacle to detecting chroma subsampling issues? at least the ones where L is not affected by it (that might be relatively rare but still)

Kornel

2022-10-12 12:51:34	it's not lower resolution, but higher resolution blurred.
2022-10-12 12:51:45	but yeah, it's a clumsy trade-off
2022-10-12 12:52:02	I don't have a proper model for interaction between luma and chroma

_wb_

2022-10-12 12:56:27

what I like about XYB is that it's basically just LMS with gamma correction, i.e. it directly models the cones

2022-10-12 01:01:24

the "we see luma in higher res than chroma" thing is partly a lie: it's more like the S-cones are sparser and mostly outside the fovea (so blue-yellow chroma is very low res), while the L and M cones are dense in the fovea, and you can basically see luma using either L or M cones while for the red-green chroma your brain relies on the difference between L and M so that makes it a bit lower resolution than luma (since basically every L or M cone is one "luma pixel" while you need at least one of each to form a "red-green chroma pixel"), but still higher resolution than S

Traneptora

2022-10-12 01:12:04	ah, so this is the biological explanation for chroma subsampling being more noticable in the Cr channel than in the Cb channel?
2022-10-12 01:13:20	with regard to 4:2:0 YCbCr

_wb_

2022-10-12 01:20:58

yes, though part of that is also because red/L has more impact on overall luma than blue/S

Jyrki Alakuijala

2022-10-12 01:35:49	I deviated from classic CIE LMS ideas by fitting the LMS responses that just 'worked the best' for the purpose of image compression
2022-10-12 01:36:17	those were a bit bizarre, because a grayscale image would be described by using all three channels
2022-10-12 01:36:48	Lode and Luca modified that system in a way that grayscale images are described by one channel
	_wb_ Well I guess I can just change that formula to drop the division and see if it works better after tuning
2022-10-12 01:38:46	I'm very curious about this 🙂 perhaps just optimizing C1 and C2 (as well as linear? repositioning of the resulting values) as part of the optimization could provide an answer

_wb_

2022-10-12 01:41:42	one idiosyncracy that persisted in XYB is that each channel has its own scale and as a 3D space it is far from perceptually uniform (a change of 0.1 in X is a much bigger change than 0.1 in Y or B). That's no issue for image compression where you use different quantizers per channel anyway, and also not for a metric like ssimulacra2 because it has different weights per channel anyway, but still...
	Jyrki Alakuijala I'm very curious about this 🙂 perhaps just optimizing C1 and C2 (as well as linear? repositioning of the resulting values) as part of the optimization could provide an answer
2022-10-12 01:44:32	well I can drop the C1 completely when removing that division. It's too early to tell if this is a good idea or not (I need a day or so of tuning before I can compare), but so far it looks promising (early training iterations are giving slightly better results than the same amount of iterations of tuning with the original formula)

Jyrki Alakuijala

2022-10-12 07:28:31

experimentation, curiosity and logic are more powerful than dogma 😄

_wb_

2022-10-12 07:33:18	currently not beating the original version yet, but my tuning takes some time to converge. Tomorrow I'll know.
2022-10-12 07:33:50	after 26 iterations it's currently here: ``` train: 17611 MSE: 50.03070287514473 \| vali: 4292 MAE: 5.128380825064601 K: 0.709664 P: 0.881838 \| all: 21903 MAE: 5.460425338341202 Kendall: 0.697490 Pearson: 0.865998 ```
2022-10-12 07:34:32	while the one with the original ssim formula was here after 26 iterations: ``` train: 17611 MSE: 53.666351158694 \| vali: 4292 MAE: 5.273077294606953 K: 0.701941 P: 0.877796 \| all: 21903 MAE: 5.6512782840159845 Kendall: 0.684267 Pearson: 0.856832 ```
2022-10-12 07:35:22	so it's looking promising...
2022-10-12 07:36:01	if confirmed, this probably deserves a small paper to rectify that historical mistake that went into SSIM
2022-10-12 07:42:00	basically the formula derivation is assuming linear intensities but everyone always uses SSIM on already gamma compressed data (including the authors, who are talking about 8-bit images a few sentences earlier which to me kind of implies it's not linear)

BlueSwordM

	BlueSwordM Man, compiling compilers is quite the task for a CPU 🙂
2022-10-12 08:30:09	Looks like I wasn't able to build Clang 14 lmao.
2022-10-12 08:30:22	It looks like I'll have to build it with Clang 14 another way 🤔

_wb_

2022-10-13 06:42:40	After 122 iterations of tuning, it's here: ``` train: 17611 MSE: 45.73781138763714 \| vali: 4292 MAE: 4.8122107903523 K: 0.716675 P: 0.886651 \| all: 21903 MAE: 5.170079162450586 Kendall: 0.708310 Pearson: 0.873783 ``` while the original formula was here after 122 iterations: ``` train: 17611 MSE: 46.868001558068066 \| vali: 4292 MAE: 4.93915328458751 K: 0.716456 P: 0.887142 \| all: 21903 MAE: 5.294815439616939 Kendall: 0.704895 Pearson: 0.872338 ```
2022-10-13 06:53:03	so still looking promising, and I can tell already that if this division by average intensity in the formula is useful at all (which I suspect it isn't), the benefit it brings must be very small
2022-10-13 07:16:20	https://twitter.com/jonsneyers/status/1580457251248431104?s=20

fab

2022-10-13 08:26:58

Continue more iterations

_wb_

2022-10-13 08:30:06	sure, it's still improving so I'm not stopping yet. ``` train: 17611 MSE: 45.65768488257069 \| vali: 4292 MAE: 4.798193148565712 K: 0.716859 P: 0.886754 \| all: 21903 MAE: 5.1603016077332695 Kendall: 0.708444 Pearson: 0.873804 ```
2022-10-13 08:32:12	as long as the MAE on the validation set keeps going down (or its Kendall/Pearson keep going up), there is no reason to stop.

fab

2022-10-13 08:37:23	Less than 200 though are ok.
2022-10-13 08:38:04	I think 300 are necessary
2022-10-13 08:38:18	But you have to be careful.

_wb_

2022-10-13 08:39:54	I am not too worried about overfitting now that I forced the weights of the subscores to be positive.
2022-10-13 08:41:46	PSNR-HVS says this: https://jon-cld.s3.amazonaws.com/test/distorted/016/mozjpeg-1x1-revert-q16.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/016/jxl-225b6884-pdc1-e6-q50.png
2022-10-13 08:53:20	VMAF says this: https://jon-cld.s3.amazonaws.com/test/distorted/792079/mozjpeg-2x2-revert-q18.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/792079/jxl-e6-q48.png
2022-10-13 08:54:16	and this: https://jon-cld.s3.amazonaws.com/test/distorted/1910225/mozjpeg-2x2-revert-q16.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1910225/jxl-225b6884-pdc1-e6-q38.png

fab

2022-10-13 08:54:49

Which encoder is it dev?

_wb_

2022-10-13 08:54:51

and this: https://jon-cld.s3.amazonaws.com/test/distorted/1943411/mozjpeg-2x2-revert-q22.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1943411/jxl-225b6884-pdc1-e6-q70.png

fab

2022-10-13 08:54:53	DEV
2022-10-13 08:54:59	?

_wb_

2022-10-13 08:55:31	and this: https://jon-cld.s3.amazonaws.com/test/distorted/1938351/mozjpeg-2x2-revert-q22.jpg is better than this: https://jon-cld.s3.amazonaws.com/test/distorted/1938351/jxl-225b6884-pdc1-e6-q56.png
2022-10-13 08:55:37	and so on, and so on

fab

2022-10-13 08:56:28

So the encoder is DEV?

_wb_

2022-10-13 08:56:42	?
2022-10-13 08:57:16	VMAF is systematically saying that bad blocky jpegs are better than smooth jxl images

fab

2022-10-13 08:58:06	Libjxl is dev version?
2022-10-13 08:58:20	Nightly version?
2022-10-13 08:58:43	225b6884

_wb_

2022-10-13 08:59:16

that's a recent git version, yes

fab

2022-10-13 08:59:33

Ah so not dev

_wb_

2022-10-13 09:00:00

VMAF says https://jon-cld.s3.amazonaws.com/test/distorted/1963557/mozjpeg-2x2-revert-q20.jpg is better than https://jon-cld.s3.amazonaws.com/test/distorted/1963557/jxl-e6-q54.png

fab

2022-10-13 09:00:11

After 4 days no dev version

_wb_

2022-10-13 09:00:23	this happens both for the 0.7 release of jxl (like in this example) and in the current git version of jxl
2022-10-13 09:01:33	curiously VMAF likes bad jpegs a lot, to the point that it is saying that mozjpeg, avif and jxl are all worse than bad jpegs

improver

2022-10-13 09:26:29

"yes i like them baked" t. VMAF

spider-mario

2022-10-13 09:28:45

“all this blocking really enhances the contrast of the image”

_wb_

2022-10-13 10:20:21	``` train: 17611 MSE: 45.59336752597455 \| vali: 4292 MAE: 4.772055238533731 K: 0.718784 P: 0.888073 \| all: 21903 MAE: 5.1490078350745 Kendall: 0.709031 Pearson: 0.874058 ``` already slightly better Kendall correlation (on the full set) than what I could obtain with the original formula, so I think I can conclude that this division is not useful
2022-10-13 10:21:12	now let's see if it will converge to something better than with the original formula

WoofinaS

2022-10-13 02:03:31

Could you repost this in the av1 server or do you not mind me doing it?

_wb_

2022-10-13 02:03:43

i don't mind

Traneptora

	_wb_ https://twitter.com/jonsneyers/status/1580457251248431104?s=20
2022-10-13 02:52:56	> Let me try to explain this one. Jyrki questioned this divisor in the SSIM equation, which effectively boils down to penalizing errors in dark regions more than in bright regions. isn't that because errors in dark regions are more visible to humans?

_wb_

2022-10-13 02:53:28

they are, but you shouldn't correct for it twice

Traneptora

2022-10-13 02:53:38

ah, I see

_wb_

2022-10-13 08:57:21	currently here: ``` train: 17611 MSE: 45.4634956462169 \| vali: 4292 MAE: 4.734498022711265 K: 0.718613 P: 0.888034 \| all: 21903 MAE: 5.124381241221485 Kendall: 0.708990 Pearson: 0.873963 ```
2022-10-13 08:58:06	slightly better MAE for both validation and training set than with the original formula. No big difference though.
2022-10-13 09:16:39	I'll let it tune another night
2022-10-14 07:36:12	this is where it went over the night: ``` train: 17611 MSE: 45.430967373087206 \| vali: 4292 MAE: 4.717704574846586 K: 0.718613 P: 0.887978 \| all: 21903 MAE: 5.115279915539398 Kendall: 0.708991 Pearson: 0.873891 ```
2022-10-14 07:36:56	stopped tuning now, MAE is still slowly decreasing but Kendall/Pearson are starting to get worse
2022-10-14 07:39:52	I'm now going to try doing the downscale in linear RGB instead of in XYB, as <@826537092669767691> is doing in DSSIM. It makes sense, and it looks like it is needed to get an example like https://twitter.com/jonsneyers/status/1580462629851889666 right — after retuning with the corrected SSIM formula, I am still getting a better score for the black-on-white text than for the white-on-black text, and it seems to be caused by the downscaled scores being worse due to incorrect downscaling
2022-10-14 07:41:27	this makes it a bit slower since the XYB conversion now needs to be done on ~twice the pixels, but whatever, it's still pretty fast and that's the perceptually correct thing to do
2022-10-14 07:51:22	i'll initialize the tuning with the weights I just got (for a version that downscales in XYB), which should help to make it re-tune faster since that'll be a better guess than just starting with all weights equal to 1
2022-10-14 08:53:57	same reason as why PSNR is included, I guess
2022-10-14 08:54:11	it's not an endorsement
2022-10-14 08:56:27	also: even bad metrics can be useful sometimes to detect encoder bugs. PSNR is not useful as a perceptual metric, but e.g. if there's a nonmonotonicity according to PSNR (higher quality setting resulting in lower score), it could be a sign of a bug
2022-10-14 09:14:18	<@826537092669767691> here are some DSSIM plots
2022-10-14 09:14:39
2022-10-14 09:15:08	Orange lines are libaom 4:4:4 tune=ssim at speeds 3 to 9
2022-10-14 09:15:34	white line is mozjpeg 4:2:0, baseline is mozjpeg -revert 4:2:0
2022-10-14 09:15:53	blueish line is current git cjxl -e 6
2022-10-14 09:16:43	the range is from mozjpeg q98 on the left to q30 on the right
2022-10-14 09:21:28	vertical axis is bytes saved when comparing avg bpp of an encode setting to the avg bpp of an 'equivalent' q setting of mozjpeg -revert, where 'equivalent' means "same average dssim score" in the first plot and "same p10 dssim score" in the second
2022-10-14 09:22:44	oops I made a braino, p10 is not a good idea if the metric is lower-is-better, I wanted to align by the p10 at the worst end, not at the best end
2022-10-14 09:25:03	fixing that, this is what the second plot should be
2022-10-14 09:34:54	so basically according to dssim, at speed 9 (which is about as fast as jxl e6), avif is worse than mozjpeg; at speed 8 it is slightly better on average but slightly worse in the low quality end at percentile 10, at speed 7 it starts to be clearly better, and you need to go to speed 3-4 to get close to jxl (though at q>80 there remains a gap). Those are 20-30 times slower than jxl e6 though.
2022-10-14 09:59:37	Just to illustrate how bad vmaf is as a metric, this is the plot for vmaf(-neg). It says that below q66 (vmaf 90) or so, mozjpeg, avif and jxl are at most 20% better than unoptimized jpeg, and at higher quality they're even worse by 30% or more
2022-10-14 10:04:23	This is what Butteraugli 3-norm says:
2022-10-14 10:06:38	PSNR-Y says this:
2022-10-14 10:07:39	MS-SSIM says this:
2022-10-14 10:22:09	<@321486891079696385> can you give me a good avifenc command line that doesn't require custom compilation (works out of the box on a default avifenc `Version: 0.10.1 (aom [enc/dec]:3.4.0)`) ? What is currently in the plots above is `-c aom -y 444 --min 0 --max 63 -s $speed -j 1 -a end-usage=q -a tune=ssim -a cq-level=$q` but I'm open to trying alternatives. Testing the slower speeds takes ages though, and they're not really deployable anyway, so preferably it should be something that works well at speed 6-7 or so.
2022-10-14 10:26:15	<@826537092669767691> i'll also benchmark `cavif-rs 1.3.5`, fyi. It's kind of refreshing that it doesn't come with numerous options besides quality and speed, I like that 🙂

WoofinaS

2022-10-14 11:11:54	`-a deltaq-mode=3` is a tune dedicated to avif encoding. `-a sharpness=(2/3)` can be used to limit filtering and almost always lowers max distortion according to butter. Almost all other settings hurt fidelity in one way or another/flat out are not functional. Both are also stock.
2022-10-14 11:19:49	Rav1e last time I check does not have as good allintra performance as aomenc, however that might be different now.

_wb_

2022-10-14 01:18:29

https://arxiv.org/pdf/2107.04510.pdf

WoofinaS

2022-10-14 01:28:30

Oh btw has cloudinary releases their results for their encoder and metric analysis publicly?

_wb_

2022-10-14 01:36:33	not yet but we're planning to write papers and blogposts
2022-10-14 01:37:35	in the meantime, I do tend to share stuff on twitter and here, since papers and blogposts will take a while

Kornel

2022-10-14 03:16:03	I've tweaked rav1e settings for cavif-rs, and it's generally on par with aom.
2022-10-14 03:16:16	aom supports more features, so there are types of images where it excels.
2022-10-14 03:16:42	but rav1e is a bit faster overall, so if you're trying to balance speed/quality, they're close.

BlueSwordM

	_wb_ <@321486891079696385> can you give me a good avifenc command line that doesn't require custom compilation (works out of the box on a default avifenc `Version: 0.10.1 (aom [enc/dec]:3.4.0)`) ? What is currently in the plots above is `-c aom -y 444 --min 0 --max 63 -s $speed -j 1 -a end-usage=q -a tune=ssim -a cq-level=$q` but I'm open to trying alternatives. Testing the slower speeds takes ages though, and they're not really deployable anyway, so preferably it should be something that works well at speed 6-7 or so.
2022-10-14 04:22:18	I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
2022-10-14 04:25:23	I'm surprised chroma deltaq hasn't become default for 4:4:4 content yet, or why DQ3 hasn't been enabled by default yet.

WoofinaS

2022-10-14 04:28:24	Rav1e might have similar appeal as aom but ironically and actually oppositely compared to inter performance it has fidelity issues. I was never able to get good d scores with rav1e at acceptable file sizes. Rav1e also isn't threaded so in practice it's slower by quite the margin.
	BlueSwordM I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
2022-10-14 04:34:20	I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.

_wb_

	BlueSwordM I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
2022-10-14 04:39:24	I only use png input so then it doesn't make a difference whether I set -y 444 or not, right?

BlueSwordM

	WoofinaS I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.
2022-10-14 04:41:51	Because it's default now?
	_wb_ I only use png input so then it doesn't make a difference whether I set -y 444 or not, right?
2022-10-14 04:42:06	Indeed.
	WoofinaS I'm also not sure why you removed ssim as it's way better then the default psnr even for "drawn content" where people claim it to be worse.
2022-10-14 04:42:23	Oh right, he's on "stable" 0.10.0.

_wb_

	WoofinaS Rav1e might have similar appeal as aom but ironically and actually oppositely compared to inter performance it has fidelity issues. I was never able to get good d scores with rav1e at acceptable file sizes. Rav1e also isn't threaded so in practice it's slower by quite the margin.
2022-10-14 04:43:53	multithreading is useful for an end-user encoding a single image on a beefy laptop or desktop, but for deployments like in Cloudinary, we usually avoid multithreading since it usually doesn't scale perfectly so while it might be good for latency, for overall throughput it's better to do single core per image if you have to encode large amounts of images

WoofinaS

2022-10-14 04:44:01

He said stock my guy.

Traneptora

2022-10-14 04:45:20

yea, it's generally easier to batch process stuff in a single core per image and let the operating system handle the parallelization

BlueSwordM

	WoofinaS He said stock my guy.
2022-10-14 04:45:37	That's not the issue there. In git/0.11.0 RC, the SSIM based post RD tune is the default.

Traneptora

2022-10-14 04:45:39

since it scales up much better

WoofinaS

	_wb_ multithreading is useful for an end-user encoding a single image on a beefy laptop or desktop, but for deployments like in Cloudinary, we usually avoid multithreading since it usually doesn't scale perfectly so while it might be good for latency, for overall throughput it's better to do single core per image if you have to encode large amounts of images
2022-10-14 04:46:06	Being able to saturate 2 threads is important still however rav1e struggles to even do that. Running a encoder per thread can actually be slower
	BlueSwordM That's not the issue there. In git/0.11.0 RC, the SSIM based post RD tune is the default.
2022-10-14 04:46:36	Ah okay

fab

2022-10-14 06:18:51

_wb_

2022-10-14 07:56:19

i'm now tuning an updated ssimulacra2 with the following changes: - added 0.05 to Y to circumvent the FastGaussian inaccuracies - modified SSIM formula to drop the double gamma correction - downscaling in linear RGB instead of XYB - using 2-norm and 8-norm instead of 1-norm and 4-norm: I saw that many of the weights for 1-norms were tuned to 0 so that norm is not very useful; I hope 2,8 will be a better combination

2022-10-15 08:11:06

ok no looks like it's not

Jyrki Alakuijala

2022-10-17 06:54:08

add all 1,2,4,8,16 and optimize their weights?

_wb_

2022-10-17 07:29:47	Yeah I might do that at some point, it blows up the number of weights to tune though. I think 1 and 4 norm is a good set if you have only 2 of them, and I doubt if having more than 2 norms is going to add much.
2022-10-17 07:31:18	I had some trouble tuning a linearly downscaling variant, wasn't getting very good results until I adjusted the objective function to optimize more for MSE and Kendall and less for Pearson

Jyrki Alakuijala

2022-10-17 07:43:40

I keep all weights the same for each aggregation level

_wb_

2022-10-17 09:34:18

Tuning causes e.g. weights for B to be low or zero at 1:1 and 1:2 and higher at 1:16 and 1:32, while for X and Y it's the other way around

Jyrki Alakuijala

2022-10-17 09:52:25

interesting

_wb_

2022-10-18 07:55:56	I cannot get to quite as good correlation with subjective data when doing SSIM correctly and when downscaling correctly (in linear instead of in XYB). But I suspect that it might still be a good idea. I think the better fit with subjective data may partially be because it could easily "see" which image was a jxl and which image was not (by looking at 1:8 error when downscaling in XYB, which is naturally going to be lower for jxl since that's its DC), and possibly it to some extent used that to infer things that allowed it to get a better fit to our data, without any real perceptual foundation. Doing downscaling in linear space is perceptually correct and doesn't correspond to what any codec internally does (they all obviously work in a gamma-compressed space), so it's more "fair" and less likely to 'overfit'.
2022-10-18 07:59:54	I can still get a MAE of under 5 on the validation set, so the difference is not huge, just a bit more error and a bit lower Kendall/Spearman/Pearson correlation than what I could get previously (which was already slightly worse than what I could get with the negative weights, but that was just overfitting in a way that generalizes very poorly)
2022-10-18 10:50:50	These are the weights it converged to: ``` X scale 1:1, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 X scale 1:1, 4-norm: ssim: 1.0035479352512353 \| ringing: 0.00011322061110474735 \| blur: 0.00040442991823685936 X scale 1:2, 1-norm: ssim: 0.0018953834105783773 \| ringing: 0.0 \| blur: 0.0 X scale 1:2, 4-norm: ssim: 8.982542997575905 \| ringing: 0.9899785796045556 \| blur: 0.0 X scale 1:4, 1-norm: ssim: 0.9748315131207942 \| ringing: 0.9581575169937973 \| blur: 0.0 X scale 1:4, 4-norm: ssim: 0.5133611777952946 \| ringing: 1.0423189317331243 \| blur: 0.000308010928520841 X scale 1:8, 1-norm: ssim: 12.149584966240063 \| ringing: 0.9565577248115467 \| blur: 0.0 X scale 1:8, 4-norm: ssim: 1.0406668123136824 \| ringing: 81.51139046057362 \| blur: 0.30593391895330946 X scale 1:16, 1-norm: ssim: 1.0752214433626779 \| ringing: 1.1039042369464611 \| blur: 0.0 X scale 1:16, 4-norm: ssim: 1.021911638819618 \| ringing: 1.1141823296855722 \| blur: 0.9730845751441705 X scale 1:32, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 X scale 1:32, 4-norm: ssim: 0.9833918426095505 \| ringing: 0.7920385137059867 \| blur: 0.9710740411514053 ```
2022-10-18 10:51:01	``` Y scale 1:1, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 Y scale 1:1, 4-norm: ssim: 0.5387077903152638 \| ringing: 0.0 \| blur: 3.4036945601155804 Y scale 1:2, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 Y scale 1:2, 4-norm: ssim: 2.337569295661117 \| ringing: 0.0 \| blur: 5.707946510901609 Y scale 1:4, 1-norm: ssim: 37.83086423878157 \| ringing: 0.0 \| blur: 0.0 Y scale 1:4, 4-norm: ssim: 3.8258200594305185 \| ringing: 0.0 \| blur: 0.0 Y scale 1:8, 1-norm: ssim: 24.073659674271497 \| ringing: 0.0 \| blur: 0.0 Y scale 1:8, 4-norm: ssim: 13.181871265286068 \| ringing: 0.0 \| blur: 0.0 Y scale 1:16, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 Y scale 1:16, 4-norm: ssim: 10.00750121262895 \| ringing: 0.0 \| blur: 0.0 Y scale 1:32, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 Y scale 1:32, 4-norm: ssim: 52.51428385603891 \| ringing: 0.0 \| blur: 0.0 ```
2022-10-18 10:51:06	``` B scale 1:1, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 B scale 1:1, 4-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 B scale 1:2, 1-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 B scale 1:2, 4-norm: ssim: 0.0 \| ringing: 0.0 \| blur: 0.0 B scale 1:4, 1-norm: ssim: 0.0 \| ringing: 0.9946464267894417 \| blur: 0.0 B scale 1:4, 4-norm: ssim: 0.0 \| ringing: 0.0006040447715934816 \| blur: 0.0 B scale 1:8, 1-norm: ssim: 0.0 \| ringing: 0.9945171491374072 \| blur: 0.0 B scale 1:8, 4-norm: ssim: 2.8260043809454376 \| ringing: 1.0052642766534516 \| blur: 8.201441997546244e-05 B scale 1:16, 1-norm: ssim: 12.154041855876695 \| ringing: 32.292928706201266 \| blur: 0.992837130387521 B scale 1:16, 4-norm: ssim: 0.0 \| ringing: 30.71925517844603 \| blur: 0.00012309907022278743 B scale 1:32, 1-norm: ssim: 0.0 \| ringing: 0.9826260237051734 \| blur: 0.0 B scale 1:32, 4-norm: ssim: 0.0 \| ringing: 0.9980928367837651 \| blur: 0.012142430067163312 ```
2022-10-18 10:52:29	so it decided that scales 1:1 and 1:2 don't matter at all for B, only when zooming out B becomes important. That does make sense, but I'm still a bit surprised that it doesn't even want to have some low weights there

Jyrki Alakuijala

2022-10-18 10:52:46	downscaling in linear is not necessarily a good idea
2022-10-18 10:53:26	when I started with guetzli and pik, I was full of enthusiasm to do everything in linear -- trying out linear DCTs
2022-10-18 10:53:50	pretty much nothing worked and the experience wasn't better

_wb_

2022-10-18 10:53:57

yeah no I don't think it works for compression purposes

Jyrki Alakuijala

2022-10-18 10:54:08

in some computer graphics uses it works

_wb_

2022-10-18 10:54:14

for a metric though, it does make sense

Jyrki Alakuijala

2022-10-18 10:54:20

but not necessarily in downscaling

_wb_

2022-10-18 10:56:16	it does make the difference in that example of black-on-white vs white-on-black, when doing nonlinear downscaling it says the black-on-white one is better while when doing linear downscaling it says the opposite (and agrees with butteraugli and dssim and my own eyes)
2022-10-18 11:02:42	also somewhat surprising is that it doesn't use the ringing term for Y, only for X and B — though of course ringing and blur are to some extent already included in ssim, which has large weights for Y
2022-10-18 11:06:50	anyway, I think it's useful if a metric doesn't use the exact same colorspace as the codecs it is testing — DSSIM working in Lab space makes it unlikely that it gives an advantage to any specific codec, since none of them work in Lab space
2022-10-18 11:08:16	this is probably one of the biggest problems with VMAF, PSNR-Y, PSNR-HVS, etc: they all work in YCbCr like most of the codecs do, which inherently gives an advantage to the codecs that work in YCbCr and a disadvantage to those that don't.
2022-10-18 11:10:22	so that's another good reason to downscale in linear RGB instead of in XYB, so at least on the zoomed out scales (which probably contribute the most to the overall score), it doesn't directly correspond to what jxl is using
2022-10-18 02:09:11	this is ssimulacra2 as it is now on git:
2022-10-18 02:09:58	and this is what happens when I do the changes to fix SSIM and downscaling in linear space:
2022-10-18 02:10:31	blue line is current git libjxl, orange are the various libaom speed settings, white is mozjpeg, green is webp m6
2022-10-18 02:11:32	this is just for a few images, to get an idea (it takes time to recompute stuff for all images)
2022-10-18 02:12:41	overall not a huge difference in what it says, in both cases it says jxl > avif > mozjpeg ~= webp
2022-10-18 02:13:57	just somewhat smaller gap between avif and jxl, as I expected
2022-10-18 02:15:35	looking at the disagreements between those two variants, the disagreements are mostly minor (things like one saying A=62, B=64 while the other says A=64, B=62), but I do tend to agree with what the linear scaling variant says
	BlueSwordM I'd advise not setting 4:4:4 manually, as avifenc will not chroma subsample unless needed: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a color:sharpness=2 -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
2022-10-18 02:20:05	I tried sharpness=2 deltaq-mode=3, but comparing at speed 6, none of the metrics seem to really like it (only butteraugli says it's a few percent better at the high quality end, the others say it's slightly worse)
2022-10-18 02:20:20	of course the metrics could very well be wrong about this

WoofinaS

2022-10-18 02:20:47

Might be a fault of cpu 6 pruning then as it's generally a fair bit better at slower presets.

_wb_

2022-10-18 02:27:43

I will try slower speeds too, but speed 6 is already on the slow end for practical deployment...

WoofinaS

2022-10-18 02:28:52

I can very much see it being applicable for things like Netflix thumbnail previews.

_wb_

2022-10-18 03:38:51

Sure, for the top 0.001% images that get viewed millions or billions of times, it makes sense to use speed 0. But for 99% of images it doesn't really make sense to spend that amount of cpu just to save a little bandwidth.

BlueSwordM

	_wb_ I tried sharpness=2 deltaq-mode=3, but comparing at speed 6, none of the metrics seem to really like it (only butteraugli says it's a few percent better at the high quality end, the others say it's slightly worse)
2022-10-19 03:38:34	Ohhh right, I forgot you're on mainline aomenc. Silly me, I forgot that aomenc devs made `--sharpness=X` useless back in June 2021 above >1(and made the RD multiplier tuning useless through it). Anyway, this should work better since RD SB now exists: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`

_wb_

2022-10-19 03:30:49	I just had a random idea: instead of working with SSIM = (intensity error) * (contrast / structure error), why not separate those two terms, compute norms of both of them, and tune weights for that. That means more weights to learn, but it might work better...
2022-10-19 03:31:40	(or worse, we'll see)
	BlueSwordM Ohhh right, I forgot you're on mainline aomenc. Silly me, I forgot that aomenc devs made `--sharpness=X` useless back in June 2021 above >1(and made the RD multiplier tuning useless through it). Anyway, this should work better since RD SB now exists: `avifenc -s X -j X --min 0 --max 63 -a end-usage=q -a cq-level=XX -a tune=ssim -a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3/4`
2022-10-19 08:54:35	I haven't looked at any actual image, but according to the metrics, `-a color:enable-chroma-deltaq=1 -a color:deltaq-mode=3` is not an overall improvement at speed 6. SSIM and MS-SSIM say it's good at the higher end (above 1.3 bpp), but all other metrics (Butteraugli, DSSIM, VMAF-NEG, PSNR-HVS) say it's worse on average. For the same cq-level setting, the quality with deltaq-mode=3 is significantly better according to all metrics, but the bpp also grows, so the curve looks worse than without the deltaq-mode=3. The curves look better than when adding `--sharpness=2` too though, you were right about that.
2022-10-19 08:55:54	does deltaq-mode=4 do something very different, or is it similar to =3 but 'more'?

BlueSwordM

	_wb_ does deltaq-mode=4 do something very different, or is it similar to =3 but 'more'?
2022-10-19 08:56:05	It does something a bit different, yes.
2022-10-19 08:56:17	I'm actually surprised, especially since DQ3/DQ4 and a couple of other all-intra patches have actually been verified with butteraugli to perform better. Note that I only said a couple, and only in all-intra.

_wb_

2022-10-19 08:56:40	it could be image-dependent
2022-10-19 08:57:30	i'm testing on the 250 images we used in our subjective study plus the 49 images of the daala testset
2022-10-19 09:00:33	this is avg bpp vs avg BA 3-norm on the 299 images, for cq=1..63 in steps of 2. Orange line at the bottom is with just tune=ssim, middle line is dq3 added, top line is dq3+sharpness2 added.
2022-10-19 09:05:18	btw you may have noticed that I make my plots in darkmode now, this is because I was hurting my eyes because I have been doing a lot of switching between comparing images at max display brightness and looking at plots 🙂
2022-10-24 07:37:37	I am scratching my head a bit here. I made a variant of ssimulacra2 that computes 180 different subscores: - 6 scales (1:1 to 1:32) - 3 components (X,Y,B) - 1-norm and 4-norm of ringing-map and blur-map - 1,2,4 norm of the squared error map - 1,2,4 norm of the ssim map and then let it tune to the subjective data I have
2022-10-24 07:38:15	of all those 180 subscores, it ended up ignoring most of them (their weights converged to zero), and it only ended up using these: X scale 1:1, 4-norm: error: 161148.74838407728 Y scale 1:1, 4-norm: blur: 0.2633687687930647 Y scale 1:1, 4-norm: ssim: 2.6431663619107493 Y scale 1:2, 4-norm: ssim: 0.9498500183656345 Y scale 1:4, 1-norm: ssim: 61.37707585349767 Y scale 1:4, 2-norm: ssim: 9.257658472175498 Y scale 1:8, 1-norm: ssim: 81.10181445428385 Y scale 1:8, 2-norm: ssim: 29.706924623161246 B scale 1:1, 4-norm: error: 407.8120233025815 B scale 1:2, 4-norm: error: 1087.4658682942747 B scale 1:16, 1-norm: ringing: 6.096135168120978 B scale 1:16, 4-norm: ringing: 14.100261857992544 B scale 1:16, 1-norm: ssim: 14.130922986455808
2022-10-24 07:44:12	now I wonder if that's just how the human visual system works, or if my tuning process is getting stuck in some local optimum
2022-10-25 10:50:23	https://jon-cld.s3.amazonaws.com/test/chall_of_fshame_SSIMULACRA_2_modelD.html
2022-10-25 10:51:47	I wrote that script to find bugs/issues in ssimulacra2 but it kind of became a demo page showing how bad psnr/ssim/vmaf are
2022-10-27 01:17:26	is there are precompiled binary windows version of butteraugli somewhere?

Jyrki Alakuijala

2022-10-31 11:18:16

I don't know

BlueSwordM

	_wb_ is there are precompiled binary windows version of butteraugli somewhere?
2022-10-31 11:33:18	There was a website at one point, but it seems like they stopped hosting libjxl libs.
2022-10-31 11:33:30	Anyway, I'll ask the website owner and then reply back with the DDL link.

Pigophone

2022-11-01 01:15:32

<@794205442175402004> butteraugli is available on vcpkg, I compiled right now with `vcpkg install butteraugli --triplet=x64-windows`. installing vcpkg is fairly easy if you'd rather compile it yourself

afed

2022-11-02 08:00:49

what about this metric? https://github.com/richzhang/PerceptualSimilarity

_wb_

2022-11-02 08:26:13

I tried it but correlation with subjective results was a bit disappointing for the quality range we are interested in.

Traneptora

2022-11-02 10:50:47	I'm looking at your hall of shame and it looks to me like metrics overvalue sharpness
2022-11-02 10:51:03
2022-11-02 10:51:10
2022-11-02 10:51:27	first one is sharper, but it distorts the colors in a way that looks pretty gross
2022-11-02 10:51:37	legacy JPEG tends to do this, it is sharp but distorted

improver

2022-11-02 10:58:11	i can see more details in sharper one though
2022-11-02 10:58:51	and less distorted shapes

_wb_

2022-11-03 07:56:48	In the end most of the distorted images there are just way to low quality and it is a bit of a 'pick your poison' situation
2022-11-03 08:01:30	But I think we made a good decision in the jxl encoder to ensure that DC (and low freq AC) gets enough bits and in the jxl spec to have good anti-banding tools. Most metrics don't seem to be bothered much by banding, yet this is an artifact that ruins an image even from far away or after downscaling.

diskorduser

2022-11-03 03:42:14

Is there any benchmark on decoding power consumption? Jxl and avif. IMO it's important for battery powered devices like laptops and smartphones

_wb_

2022-11-03 03:59:31	Not afaik, but should be roughly comparable I think.
2022-11-03 04:00:52	For still images on the web, probably decode power consumption is not that important compared to extra time spent on transfer and with the screen on while waiting for an image to load

veluca

2022-11-03 06:48:19

FWIW, probably AVIF decoding power consumption will be lower, those FP units are *expensive*

_wb_

2022-11-03 07:18:31	Yeah we should probably at some point see if decode can be done completely with the arithmetic and buffers in (mostly) int16...
2022-11-03 07:21:26	(some intermediate results in int32 for idct but most other stuff in int16)

Traneptora

	veluca FWIW, probably AVIF decoding power consumption will be lower, those FP units are expensive
2022-11-03 07:42:43	you can in theory do it in fixed point, can't you?

veluca

2022-11-03 07:43:30

"in theory" is the keyword here 😛

Traneptora

2022-11-03 07:43:46

dunno whatever happened to Lynne's decoder ideas

_wb_

	veluca "in theory" is the keyword here 😛
2022-11-03 07:49:43	Is there anything besides idct that would be tricky to do in fixedpoint? You already did xyb2rgb, right?
2022-11-03 07:50:36	Gaborish, epf, blending of frames etc, that should all not be too hard to do in int16 instead of float, right?
2022-11-03 07:52:48	I mean, it's not like we really depend on the exponent part of floats or something like that, we're mostly just using them as int24_t

Traneptora

2022-11-03 07:55:08	given that X, Y, B are typically between 0 and 1 that makes sense to me
2022-11-03 07:55:26	floats give a bit more precision in the lower order bits but if it's within tolerance I suppose that doesn't matter

_wb_

2022-11-03 07:58:06	X is typically very close to zero, its range is like -0.03 to 0.03 or something like that
2022-11-03 07:58:48	But I don't think we need subnormals or anything like that

Traneptora

2022-11-03 07:59:13

why is X so close to zero, is that because L and M are typically much closer to each other than either are to S?

_wb_

2022-11-03 08:00:53

Yeah L and M are pretty similar so you can't really get abs(L-M) to be large

veluca

2022-11-03 08:01:11

EPF has divisions 😱

Traneptora

2022-11-03 08:01:24

I wonder if you can use Rationals

veluca

2022-11-03 08:01:41

Otherwise yeah, it should be fine, but I do wonder about the range of things

_wb_

2022-11-03 08:02:16

Is integer division that bad? Can it be approximated?

veluca

2022-11-03 08:02:28	Is it that bad? YES
2022-11-03 08:02:39	Can it be approximated? Probably

_wb_

2022-11-03 08:03:43

Yeah the ranges will need to be figured out

veluca

2022-11-03 08:03:47

to give an idea, compared to SIMD multiplies, I think i16-bit divisions could easily be 100x slower

_wb_

2022-11-03 08:04:24

What's the range of the divisors?

veluca

2022-11-03 08:04:40	pretty much anything unfortunately
2022-11-03 08:05:25	the most reasonable thing IMO is to do an initial approximation using only the highest set bit, and then do 2 newton steps
2022-11-03 08:05:53	still probably ~20 cycles for a vector, but much better than ~500

_wb_

2022-11-03 08:09:15

Anyway jxl without epf is still kind of nice, so if epf cannot be made fast, it's mostly a problem for the lower qualities, not for the higher qualities where epf is probably not even used by default

Traneptora

	veluca Otherwise yeah, it should be fine, but I do wonder about the range of things
2022-11-03 08:23:49	why not have something like ```c struct Rational { int32_t numer, denom; } ```
2022-11-03 08:24:15	addition would be slower but division would be just as fast as multiplication

veluca

2022-11-03 08:24:16	you need a division to do a weighted average
2022-11-03 08:24:30	carrying around the rational number doesn't help you that much

Traneptora

2022-11-03 08:24:33	Yea, you'd just keep track of rational numbers as an ordered pair of integers
2022-11-03 08:24:43	and you wouldn't actually do the division until the final stage

veluca

2022-11-03 08:24:57	eh, probably not worth it
2022-11-03 08:25:14	actually, definitely not worth it

Traneptora

2022-11-03 08:25:18

how so?

veluca

2022-11-03 08:25:25

EPF is 3 divisions (+eps) per pixel at worst

Traneptora

2022-11-03 08:25:47

I'd have to check how EPF works

veluca

2022-11-03 08:25:54	(good luck xD)
2022-11-03 08:26:19	but long story short, for each pixel it computes a vector of weights of neighbours and then replaces each neighbour with the weighted avg
2022-11-03 08:26:44	so you need one reciprocal per EPF iteration (of which there are at most 3)
2022-11-03 08:27:15	it's not per-channel, so this is not worse than dividing in the end to compute channel values

Traneptora

2022-11-03 08:28:46	how is it not per channel?
2022-11-03 08:29:00	how would you even do that "globally"
2022-11-03 08:30:06	> replaces each neighbour with the weighted avg so it divides by 8 for everything except the edge cases, right?
2022-11-03 08:30:59	which is just a right shift

veluca

2022-11-03 08:40:09	no no, every pixel gets its own weight (a float in [0, 1])
2022-11-03 08:40:20	but the weight is per-pixel, not per-pixel-per-channel

Traneptora

2022-11-03 08:41:29

o, I'll have to read the spec

jox

2022-11-15 10:07:11

I just want to share a small test I did with cjxl. I personally can't see any difference between these two photos and the size reduction is impressive! Can anyone spot any visual differences? <:PepeGlasses:878298516965982308>

Traneptora

2022-11-15 10:12:06

d1 is really a sweet spot for jxl

_wb_

2022-11-15 10:12:58

I can see minor differences at 2x zoom, nothing problematic though at first sight

Traneptora

2022-11-15 10:13:52	butteraugli is tuned for standard viewing distance
2022-11-15 10:14:12	d1 will be mostly unnoticed at 1x zoom

jox

	_wb_ I can see minor differences at 2x zoom, nothing problematic though at first sight
2022-11-15 10:21:31	Oh really, I can't even see any differences at 3x zoom. Maybe my eye is not trained to find the artifacts. What should I look for? I am genuinely curious to know what changes with different quality settings.

lonjil

2022-11-15 10:52:40	"zoom" doesn't actually tell you viewing distance. Someone sitting closer to a bigger screen will see more detail even at 1x zoom.
2022-11-15 10:55:21	at 100% zoom, I believe I see the original having more noise and the jxl one being slightly smoother.
2022-11-15 10:57:46	yes, the original definitely has some high frequency noise that becomes nearly imperceptible as you go further from the screen

monad

2022-11-16 12:12:29

"viewing distance measured in pixels"

2022-11-16 03:11:56

ssimulacra2 87.45555778 butteraugli 0.589679

Traneptora

2022-11-16 03:21:30

probably a bug

_wb_

2022-11-16 03:26:36

well those two images are slightly different: ``` $ compare -metric psnr orig.png dist.png null: 62.422 $ compare -verbose -metric pae orig.png dist.png null: orig.png PNG 1920x1080 1920x1080+0+0 8-bit sRGB 80108B 0.020u 0:00.032 dist.png PNG 1920x1080 1920x1080+0+0 8-bit sRGB 341622B 0.030u 0:00.027 Image: orig.png Channel distortion: PAE red: 771 (0.0117647) green: 514 (0.00784314) blue: 771 (0.0117647) all: 771 (0.0117647) ```

2022-11-16 03:27:35

i just wonder about the scores...

_wb_

2022-11-16 03:27:37	not in any visible way, but then again those ssimulacra2/butteraugli scores are in the "visually lossless" range too
2022-11-16 03:28:29	basically ssimulacra2 above 85 you can consider visually lossless, butteraugli maxnorm below 1 too

2022-11-16 03:29:04

it's +- 3 rgb, i would give it like a 100

DuxVitae

2022-11-17 11:06:06

I noticed that for the following image the file size for modular lossy and modular lossless is far from what I expected (lossy 6x larger than lossless): `C:\Program Files\libjxl>cjxl C:\maze.png C:\maze_lossy.jxl -e 9 -m 1 -d 1 JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown] Read 994x994 image, 12554 bytes, 394.7 MP/s Encoding [Modular, d1.000, effort: 9], Compressed to 24000 bytes (0.194 bpp). 994 x 994, 0.51 MP/s [0.51, 0.51], 1 reps, 6 threads.` `C:\Program Files\libjxl>cjxl C:\maze.png C:\maze.jxl -e 9 -m 1 -d 0 JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown] Read 994x994 image, 12554 bytes, 413.1 MP/s Encoding [Modular, lossless, effort: 9], Compressed to 3336 bytes (0.027 bpp). 994 x 994, 0.36 MP/s [0.36, 0.36], 1 reps, 6 threads.`

_wb_

2022-11-17 11:48:56

This is a typical image that is hard for dct and easy for lossless. We should add a heuristic to detect such cases and encode losslessly if that's smaller than lossy.

spider-mario

2022-11-17 01:32:01

that’s lossy modular, though, not vardct, would we expect that there as well?

_wb_

2022-11-17 02:12:55

Lossy modular also uses xyb (no issue here) and a frequency transform that will turn this low entropy image into a higher entropy thing

Demiurge

2022-11-17 04:09:00

What kind of transform?

_wb_

2022-11-17 04:09:59

Squeeze, which is a reversible modified Haar transform

Demiurge

2022-11-17 04:10:00

This maze image looks very chaotic in an extremely orderly way

DZgas Ж

DuxVitae I noticed that for the following image the file size for modular lossy and modular lossless is far from what I expected (lossy 6x larger than lossless): `C:\Program Files\libjxl>cjxl C:\maze.png C:\maze_lossy.jxl -e 9 -m 1 -d 1 JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown] Read 994x994 image, 12554 bytes, 394.7 MP/s Encoding [Modular, d1.000, effort: 9], Compressed to 24000 bytes (0.194 bpp). 994 x 994, 0.51 MP/s [0.51, 0.51], 1 reps, 6 threads.` `C:\Program Files\libjxl>cjxl C:\maze.png C:\maze.jxl -e 9 -m 1 -d 0 JPEG XL encoder v0.8.0 54bf3b2 [AVX2,SSE4,SSSE3,Unknown] Read 994x994 image, 12554 bytes, 413.1 MP/s Encoding [Modular, lossless, effort: 9], Compressed to 3336 bytes (0.027 bpp). 994 x 994, 0.36 MP/s [0.36, 0.36], 1 reps, 6 threads.`

2022-11-17 09:37:31

Oh, very familiar. ||<:JXL:805850130203934781> ||

afed

2022-11-21 05:50:21

strange benchmarks <:PepeGlasses:878298516965982308> https://www.lossless-benchmarks.com/

Jim

2022-11-21 05:53:13

Now do avif <:ReeCat:806087208678588437>

_wb_

2022-11-21 06:22:40

Strange that jxl is a single point. There is e1 to e9 in libjxl, and then there's fjxl with various effort settings too

DZgas Ж

	afed strange benchmarks <:PepeGlasses:878298516965982308> https://www.lossless-benchmarks.com/
2022-11-21 08:59:16	>not encode params >QOI best > <:ReeCat:806087208678588437> <:ReeCat:806087208678588437> <:ReeCat:806087208678588437>
	afed strange benchmarks <:PepeGlasses:878298516965982308> https://www.lossless-benchmarks.com/
2022-11-21 09:03:25	I can assume that this these measures the "Fact time" per "unit" of computing time, this is essentially the same as the tests between x86 and ARM with "energy" costs for the same tasks (Excluding total intake and Do speed)

afed

2022-11-22 12:27:55	```ZPNG "lossless" (the original, upstream ZPNG) is very competitive with and sometimes strictly better than QOIR. It occupies a similar spot in the design space to QOIR: simple implementation, reasonable compression ratio, very fast encode / decode speeds. "Sometimes strictly better" means that, on some photographic-heavy subsets of the image test suite (see the full benchmarks), ZPNG outperforms QOIR on all three columns (compression ratio, encoding speed and decoding speed) simultaneously. ZPNG is in some sense simpler than QOIR (ZPNG is around 700 lines of C++ code plus a zstd dependency) but in another sense more complicated (because of the zstd dependency)``` https://github.com/nigeltao/qoir
2022-11-22 12:28:13	``` QOIR_Lossless 1.000 RelCmpRatio 1.000 RelEncSpeed 1.000 RelDecSpeed (1) JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2) JXL_Lossless/l3 0.725 RelCmpRatio 0.032 RelEncSpeed 0.022 RelDecSpeed JXL_Lossless/l7 0.613 RelCmpRatio 0.003 RelEncSpeed 0.017 RelDecSpeed PNG/fpng 1.234 RelCmpRatio 1.138 RelEncSpeed 0.536 RelDecSpeed (1) PNG/fpnge 1.108 RelCmpRatio 1.851 RelEncSpeed n/a RelDecSpeed (1) PNG/libpng 0.960 RelCmpRatio 0.033 RelEncSpeed 0.203 RelDecSpeed PNG/stb 1.354 RelCmpRatio 0.045 RelEncSpeed 0.186 RelDecSpeed (1) PNG/wuffs 0.946 RelCmpRatio n/a RelEncSpeed 0.509 RelDecSpeed (1), (3) QOI 1.118 RelCmpRatio 0.870 RelEncSpeed 0.700 RelDecSpeed (1) WebP_Lossless 0.654 RelCmpRatio 0.015 RelEncSpeed 0.325 RelDecSpeed ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (4) ZPNG_NofilLossl 1.330 RelCmpRatio 0.843 RelEncSpeed 1.168 RelDecSpeed (4)```
2022-11-22 12:29:05	ZPNG as I understand it is basically PNG with some modified and added filters and compressed with Zstd
2022-11-22 01:28:27	so the biggest improvement comes from zstd, because it is highly optimized, making zpng faster even than ~~fpnge~~ fast jxl, while still having good compression? but, there are some issues with this comparison because gcc is used and often with no extra flags for optimization, with clang all *jxl things are much faster, also if using native compilation for cpu and for ~~fpnge~~ fast jxl with gcc and libgomp multithreading is not working correctly (at least for me) sometimes 2x slower (although this is a single-threaded comparison, but still)

veluca

2022-11-22 01:32:40

fpnge is ~2x faster than zpng 😉

afed

2022-11-22 01:42:08

in single-threaded mode? and it looks like the latest zstd library is used in the comparison, not the one in the original repo

veluca

2022-11-22 01:43:30	I mean, in that table it's ~2x faster
2022-11-22 01:44:01	so single-threaded I'd say

afed

2022-11-22 01:46:23

ah, yeah, I meant fast_lossless jxl

veluca

2022-11-22 01:47:48	should still be faster tbh, probably depends on compilation options
2022-11-22 01:48:02	how did you compile?

afed

2022-11-22 01:51:39	```CXX="${CXX-g++}"``` or clang with the default sh and with "-march=native -mtune=native"
2022-11-22 01:59:28	although there is also "-march=native", but still a gcc compiler <https://github.com/nigeltao/qoir/blob/main/script/run_full_benchmarks.sh>

veluca

2022-11-22 02:01:22	there are special flags to enable intrinsics both in fjxl and in fpnge
2022-11-22 02:01:54	FASTLL_ENABLE_AVX2_INTRINSICS=1
2022-11-22 02:01:58	ok, yeah, that's set

afed

2022-11-22 02:03:25

but not for fpnge? ```echo 'Compiling out/jxl_adapter.o' $CXX -c -march=native \ -DFASTLL_ENABLE_AVX2_INTRINSICS=1 \ -I../libjxl/build/lib/include \ -I../libjxl/lib/include \ $CXXFLAGS -Wno-unknown-pragmas adapter/jxl_adapter.cpp \ $LDFLAGS -o out/jxl_adapter.o echo 'Compiling out/png_fpnge_adapter.o' $CXX -c -march=native \ $CXXFLAGS adapter/png_fpnge_adapter.cpp \ $LDFLAGS -o out/png_fpnge_adapter.o```

veluca

2022-11-22 02:04:56

apparently for fpnge I made it unconditional-if-supported

afed

2022-11-22 02:09:02

looks like there was a speed bump, seems like the cause could be gcc (to achieve full real performance)? <https://github.com/nigeltao/qoir/commit/c2fec8401776bb0bf58106f5315552b36aab4b08>

veluca

2022-11-22 02:11:26	could be
2022-11-22 02:11:31	also depends on the effort level

afed

2022-11-22 07:28:28	fpnge -r500 (first clang, second gcc, other flags unchanged) ```210.555 MP/s 16.822 bits/pixel 172.744 MP/s 16.822 bits/pixel - 645.121 MP/s 0.429 bits/pixel 612.370 MP/s 0.429 bits/pixel - 199.539 MP/s 18.010 bits/pixel 165.603 MP/s 18.010 bits/pixel```
2022-11-22 07:36:34	-r1000 ```200.624 MP/s 17.370 bits/pixel 168.969 MP/s 17.370 bits/pixel - 283.050 MP/s 9.981 bits/pixel 253.117 MP/s 9.981 bits/pixel```

veluca

2022-11-22 08:25:41	yup, I wrote it testing with clang so that's not entirely surprising
2022-11-22 08:26:49	although it's suprising that changing the # of repetitions would change the bpp...

_wb_

2022-11-22 08:27:40

Probably different images

afed

2022-11-22 08:28:30

yeah, these are different images, I just want to make sure it's consistently repeatable

_wb_

2022-11-22 08:30:20

I assume gcc is doing a bit less autovec or something? Or what's causing the difference?

afed

2022-11-22 08:32:47

and for fast_lossless_jxl there is something strange about threading with gcc (or libgomp vs libomp), it seems to run less threads than with clang

veluca

2022-11-22 08:39:35

could also be less optimal register allocator or the like, hard to say

afed

2022-11-23 12:49:33

`Add LZ4PNG to full benchmarks` <https://github.com/nigeltao/qoir> ```QOIR_Lossless 1.000 RelCmpRatio 1.000 RelEncSpeed 1.000 RelDecSpeed (1) JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2) JXL_Lossless/l3 0.725 RelCmpRatio 0.032 RelEncSpeed 0.022 RelDecSpeed JXL_Lossless/l7 0.613 RelCmpRatio 0.003 RelEncSpeed 0.017 RelDecSpeed LZ4PNG_Lossless 1.403 RelCmpRatio 1.038 RelEncSpeed 1.300 RelDecSpeed (3) LZ4PNG_NofilLsl 1.642 RelCmpRatio 1.312 RelEncSpeed 2.286 RelDecSpeed (3) PNG/fpng 1.234 RelCmpRatio 1.138 RelEncSpeed 0.536 RelDecSpeed (1) PNG/fpnge 1.108 RelCmpRatio 1.851 RelEncSpeed n/a RelDecSpeed (1) PNG/libpng 0.960 RelCmpRatio 0.033 RelEncSpeed 0.203 RelDecSpeed PNG/stb 1.354 RelCmpRatio 0.045 RelEncSpeed 0.186 RelDecSpeed (1) PNG/wuffs 0.946 RelCmpRatio n/a RelEncSpeed 0.509 RelDecSpeed (1), (4) QOI 1.118 RelCmpRatio 0.870 RelEncSpeed 0.700 RelDecSpeed (1) WebP_Lossless 0.654 RelCmpRatio 0.015 RelEncSpeed 0.325 RelDecSpeed ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (3) ZPNG_NofilLsl 1.330 RelCmpRatio 0.843 RelEncSpeed 1.168 RelDecSpeed (3)``` yeah, zstd seems to do the main job, lz4 is faster, but compression suffers a lot

_wb_

2022-11-23 01:24:42

you could also try png.br, which is something you could actually use on the web right now (i.e. a png that only uses the uncompressed fallback mode of DEFLATE, and then gets sent with brotli transfer-encoding)

afed

2022-11-23 01:46:56

yeah, but it has limited support (transfer mostly) and difficult to use and zpng has faster and more effective filtering <https://github.com/catid/Zpng> ```This library is similar to PNG in that the image is first filtered, and then submitted to a data compressor. The filtering step is a bit simpler and faster but somehow more effective than the one used in PNG. The data compressor used is Zstd, which makes it significantly faster than PNG to compress and decompress. Filtering: (1) Reversible color channel transformation. (2) Split each color channel into a separate color plane. (3) Subtract each color value from the one to its left.``` though, if fpnge had a filter-only mode, that would be usable for some mixes, even with brotli or fast jxl which would work like zpng but use brotli instead of zstd, or even gpu brotli <:FeelsAmazingMan:808826295768449054>

_wb_

2022-11-23 01:56:53

That Zpng filtering can be represented in jxl's modular: it's just a particular fixed RCT (jxl has many, and the Zpng one is probably not the most effective one) and using W as a fixed predictor. In modular jxl, things are always planar.

afed

2022-11-23 02:19:06	i mean, if properly picked from something simple, it can also be very fast and with enough efficiency, like here: ```JXL_Lossless/f 0.860 RelCmpRatio 0.630 RelEncSpeed 0.120 RelDecSpeed (2) ZPNG_Lossless 0.864 RelCmpRatio 0.747 RelEncSpeed 0.927 RelDecSpeed (3)``` though it may only be on this test set and also using clang may change the results
	afed yeah, but it has limited support (transfer mostly) and difficult to use and zpng has faster and more effective filtering <https://github.com/catid/Zpng> ```This library is similar to PNG in that the image is first filtered, and then submitted to a data compressor. The filtering step is a bit simpler and faster but somehow more effective than the one used in PNG. The data compressor used is Zstd, which makes it significantly faster than PNG to compress and decompress. Filtering: (1) Reversible color channel transformation. (2) Split each color channel into a separate color plane. (3) Subtract each color value from the one to its left.``` though, if fpnge had a filter-only mode, that would be usable for some mixes, even with brotli or fast jxl which would work like zpng but use brotli instead of zstd, or even gpu brotli <:FeelsAmazingMan:808826295768449054>
2022-11-23 02:36:45	however, just filtering without knowing what compression method will be used afterwards is probably not a very optimal way

veluca

2022-11-23 03:24:47	I don't find fast lossless en/decoders for a new format to be that interesting, it's not that hard to make something that is very fast and very dense if you don't have constraints on the bitstream
2022-11-23 03:26:05	(and still fits in a couple thousand lines without external libs, say)

afed

2022-11-23 03:45:38	yeah, highly specialized formats, especially those that don't scale well to anything else are interesting maybe just for research or experiments i meant that among the existing ones, other approaches can also be made without breaking the specs (which has already been done, but I refer to even different ways)
2022-11-23 04:17:09	<https://github.com/nigeltao/qoir> sad that run_benchmarks.sh does not compile in my environment (I am not on linux), maybe someone can test this but with clang instead of gcc? (or gcc vs clang, so we can compare compiler impact) <https://github.com/nigeltao/qoir/blob/main/run_benchmarks.sh#L3>
2022-11-23 08:15:08	found something (and even clang 14 is worse for some reason) <https://www.phoronix.com/review/aocc4-gcc-clang/3>
2022-11-23 08:15:46
2022-11-23 08:16:20

BlueSwordM

	afed found something (and even clang 14 is worse for some reason) <https://www.phoronix.com/review/aocc4-gcc-clang/3>
2022-11-23 08:20:18	We've talked about this in the past, but I still haven't managed to do any profiling since I can't seem to install Clang 14 on my system.

afed

2022-11-23 08:22:24

though maybe because this is a very new cpu

BlueSwordM

	afed though maybe because this is a very new cpu
2022-11-23 08:30:16	I don't think that's why. I also myself noticed a massive speed increase when I rebuilt libjxl when my distro updated from Clang 14 to Clang 15.

veluca

2022-11-23 10:13:34

those numbers are weird...

pshufb

2022-11-24 06:04:25

I wouldn't be surprised if the difference went away with PGO

afed

2022-11-27 01:34:26	<:DogWhat:806133035786829875>
2022-11-27 01:36:29

veluca

2022-11-27 01:41:11

what's so surprising?

afed

2022-11-27 01:42:36

clang vs gcc performance (it's all single-threaded, same conditions, compiler is the only difference)

veluca

2022-11-27 01:46:05

I'm not too surprised 😛

afed

2022-11-27 01:46:47

i expected some difference, but not that much

yurume

2022-11-27 02:12:14	within my expectation, too
2022-11-27 02:12:33	autovectorization and low-level construct is very sensitive to compilers

TheBigBadBoy - 𝙸𝚛

2022-11-28 04:30:55

I did not know that clang produces faster binaries 😮 What were the compilation flags pls ? Also, could be the default compilation flags not beeing the same between gcc and clang ?

diskorduser

	afed <:DogWhat:806133035786829875>
2022-11-28 05:53:41	Does profiled compilation improved encoding speed?

afed

2022-11-28 11:58:22

perhaps, but I don't think pgo will be commonly used in benchmarks because it requires some preparation to compile (it's not just adding an extra key)

Jim

	TheBigBadBoy - 𝙸𝚛 I did not know that clang produces faster binaries 😮 What were the compilation flags pls ? Also, could be the default compilation flags not beeing the same between gcc and clang ?
2022-11-29 11:40:50	Clang probably focuses more on optimizations which generally take more time to encode (clang is usually slower at encoding), but produces faster-running code. It's not likely due to flags, just how much effort they put into detecting common code and optimizing the output.

yurume

2022-11-29 11:46:56

I've personally seen cases where GCC produces a better code than clang as well, I agree clang is trying to be more aggressive in general but individual cases can vary much.

sklwmp

	Jim Clang probably focuses more on optimizations which generally take more time to encode (clang is usually slower at encoding), but produces faster-running code. It's not likely due to flags, just how much effort they put into detecting common code and optimizing the output.
2022-11-29 02:32:49	I always heard that Clang was faster at encoding than GCC, but produced slower binaries. I guess things have changed in the meantime?

Fraetor

2022-12-01 09:08:03

For our heavy compute stuff at work we either use GCC or the Cray compiler. TBF, that might be because we have a lot of Fortran, which I don't know how well LLVM deals with.

afed

2022-12-01 09:15:54

8-bit encoding will be slower after this changes? `8 bit paths are not SIMDfied yet` <https://github.com/libjxl/libjxl/pull/1938>

_wb_

2022-12-01 09:16:58	no
2022-12-01 09:18:13	yeah that's > 8 bit
2022-12-01 09:18:31	which in markdown renders like a quote 🙂
2022-12-01 09:18:36	`> 8 bit`
2022-12-01 09:19:49	8-12 bit will be same as before, 13-14 is as fast as 9-12, and 15-16 will be a bit slower

afed

2022-12-01 09:29:33

ah, I see, good, I wonder the simplest animation (like just various images in one, without inter-frame optimization) is easier to add in fpnge or fjxl (or it is not so easy to implement)? sometimes i need something very fast like this

veluca

2022-12-01 11:51:48	shouldn't be too hard
2022-12-01 11:53:46	will I do that? who knows 😛

afed

2022-12-05 05:26:18

`Add WebP_Lossy2 to full benchmarks` <https://github.com/nigeltao/qoir/commit/5671f584dcf84ddb71e28da6fa60225abe915e43> a very strange lossy modification <:WTF:805391680538148936> ```WebP_Lossy 0.084 RelCmpRatio 0.065 RelEncSpeed 0.453 RelDecSpeed WebP_Lossy2 0.443 RelCmpRatio 0.015 RelEncSpeed 0.435 RelDecSpeed``` `(5), the Lossy2 suffix, means that the images are encoded losslessly (even though e.g. WebP does have its own lossy format) but after applying QOIR's lossiness=2 quantization, reducing each pixel from 8 to 6 bits per channel.`

Traneptora

2022-12-05 05:27:30	keep in mind that webp has a lossless-prefilter option
2022-12-05 05:27:38	where images are prefiltered and then compressed losslessly

Jyrki Alakuijala

2022-12-06 12:52:34

near-lossless webp is nice

daniilmaks

	Traneptora where images are prefiltered and then compressed losslessly
2022-12-07 12:06:07	so like lossyWAV?

Traneptora

2022-12-07 03:40:40

idk what that is

Quikee

2022-12-07 04:55:50

it lossily filters a wav audio file so it can be compressed better with a lossless compressor like FLAC

daniilmaks

2022-12-07 05:23:22

its for various lossless audio algorithms, including flac, yes.

Demiurge

2022-12-09 05:43:39	Yeah, that's very cool. And the exact same idea can be applied to images in the same way.
2022-12-09 05:45:20	Removing the least significant bits, dithering the result using a noise shaping algorithm, doing it based on the psychoperceptual idea of activity masking, all that applies to images just as much as it applies to sound

afed

2022-12-15 03:55:22

inf MP/s <:monkaMega:809252622900789269> `2289 x 1288, geomean: inf MP/s [305.81, 413.46], 200 reps, 0 threads`

_wb_

2022-12-15 03:57:41

Lol we may need to do that computation with doubles instead of floats, I suppose 🙂

veluca

2022-12-15 04:01:47

or taking some logs first

afed

2022-12-15 04:02:27

```Single-threaded: fjxl 0 gcc 12.2.0 69.384 MP/s 9.987 bits/pixel fjxl 0 clang 15.0.5 141.867 MP/s 9.987 bits/pixel fjxl 1 gcc 12.2.0 70.452 MP/s 9.381 bits/pixel fjxl 1 clang 15.0.5 146.944 MP/s 9.381 bits/pixel fjxl 2 gcc 12.2.0 69.940 MP/s 9.381 bits/pixel fjxl 2 clang 15.0.5 147.396 MP/s 9.381 bits/pixel cjxl 1 clang 15.0.5 3840 x 2160, geomean: 161.78 MP/s [96.17, 169.82], 200 reps, 0 threads.```

veluca

2022-12-15 04:03:20

seems about right

afed

2022-12-15 04:08:28

1=1 thread for fjxl? and min/max MP/s as in cjxl would be useful

veluca

2022-12-15 04:12:06	feel free to send a PR 🙂
2022-12-15 04:12:12	should be easy

afed

2022-12-15 04:14:01

though using fast_lossless in cjxl is easier anyway 8 threads ```fjxl 0 gcc 12.2.0 258.911 MP/s 9.987 bits/pixel fjxl 0 clang 15.0.5 520.788 MP/s 9.987 bits/pixel fjxl 1 gcc 12.2.0 258.484 MP/s 9.381 bits/pixel fjxl 1 clang 15.0.5 516.224 MP/s 9.381 bits/pixel fjxl 2 gcc 12.2.0 250.453 MP/s 9.381 bits/pixel fjxl 2 clang 15.0.5 507.487 MP/s 9.381 bits/pixel cjxl 1 clang 15.0.5 3840 x 2160, geomean: inf MP/s [380.74, 687.32], 200 reps, 8 threads.```

2022-12-15 04:22:20

animation also works

veluca

2022-12-15 04:25:39	really?
2022-12-15 04:25:51	does it also decode to multiple frames? xD

afed

2022-12-15 04:31:55

It seems so

veluca

2022-12-15 04:35:09

consider me very surprised

afed

2022-12-15 04:43:00	but animated png and webp not working (only gif), but it seems to be a cjxl issue though, png with alpha works at higher efforts
2022-12-15 04:45:35	for pngs (without alpha also) there is no errors and no output

_wb_

2022-12-15 04:49:15	took me a while to understand that the above is an animated png
2022-12-15 04:51:50	"Graceful degradation" is kind of annoying

afed

2022-12-15 04:55:40

yeah, discord doesn't support webp and png animation but, I did not find a gif that did not work, something related to the number of colors?

_wb_

2022-12-15 04:59:01	hm, I get a segfault
2022-12-15 04:59:38	hread 1 "cjxl" received signal SIGSEGV, Segmentation fault. JxlEncoderStruct::RefillOutputByteQueue (this=this@entry=0x5555555e4ca0) at /home/jon/dev/libjxl/lib/jxl/encode.cc:492 492 duration = input_frame->option_values.header.duration;

veluca

2022-12-15 09:18:14

I am absolutely not surprised

sklwmp

2022-12-17 05:04:20

The AVIF team's new data is weird...

BlueSwordM

	sklwmp The AVIF team's new data is weird...
2022-12-17 05:22:19	Not weird. Straight up wrong <:YEP:808828808127971399>
2022-12-17 05:32:14	<@557099078337560596> <@179701849576833024> They're playing dirty: `MediaTek MT8173C ` They know what they're doing, especially when using an old version of libjxl inside of Chrome <:YEP:808828808127971399>

_wb_

2022-12-17 08:09:50

It's also kind of unfair to only look at the fastest possible avif files (8-bit 4:2:0 with an rgb colorspace that has a specialized path) while ignoring the more general case (say 10-bit 4:4:4 with a non-specialized colorspace).

veluca

2022-12-17 08:32:46

I assume that's a CPU with not too good float decoding?

_wb_

2022-12-17 08:47:02	The data below was run on an ASUS Chromebook C202XA with a Mediatek MT8173c, running Chrome Version 110.0.5447.0 (Official Build) dev (32-bit).
2022-12-17 08:48:23	Why does it say 32-bit if that cpu is supposed to be arm64?
2022-12-17 08:49:10	I suppose that means they're using an armv7 build?

veluca

2022-12-17 08:50:18	Absolutely possible
2022-12-17 08:50:30	Most android phones actually stick to armv7
2022-12-17 08:50:52	In a couple of years it may change

sklwmp

2022-12-17 09:34:47

Yeah, especially with Google going 64-bit only for the Pixel 7.

veluca

2022-12-17 10:19:30	and I think we haven't benchmarked armv7 decoding performance in a long time (if ever)
2022-12-17 10:19:53	I can 100% believe it being a lot slower than aarch64 on the same CPU

_wb_

2022-12-17 10:34:08	Why would you use a 32-bit build of Chrome on a 64-bit Chromebook? That just seems like a bad idea. Am I missing something?
2022-12-17 10:55:56	I really wonder what is going on with those decode speeds at https://sneyers.info/browserspeedtest/index2.html
2022-12-17 10:56:24	can someone report the numbers they're getting? on my phone, 4:4:4 avif makes it crash chrome
2022-12-17 10:56:36	on my laptop, I get these numbers: 011.jxl: Decode speed: 41.72 MP/s \| Fetch: 223.90ms \| 100 decodes: 785.40ms 011-fd1.jxl: Decode speed: 45.05 MP/s \| Fetch: 180.90ms \| 100 decodes: 727.40ms 011-fd2.jxl: Decode speed: 47.10 MP/s \| Fetch: 190.90ms \| 100 decodes: 695.70ms 011-fd3.jxl: Decode speed: 48.77 MP/s \| Fetch: 184.90ms \| 100 decodes: 671.90ms 011-8bit420.avif: Decode speed: 86.30 MP/s \| Fetch: 200.00ms \| 100 decodes: 379.70ms 011-8bit444.avif: Decode speed: 2.91 MP/s \| Fetch: 209.00ms \| 100 decodes: 11261.10ms 011-10bit.avif: Decode speed: 2.86 MP/s \| Fetch: 208.10ms \| 100 decodes: 11455.70ms 011-12bit.avif: Decode speed: 2.85 MP/s \| Fetch: 215.00ms \| 100 decodes: 11507.10ms 011.webp: Decode speed: 122.09 MP/s \| Fetch: 186.30ms \| 100 decodes: 268.40ms 011.jpg: Decode speed: 112.07 MP/s \| Fetch: 196.50ms \| 100 decodes: 292.40ms
2022-12-17 11:01:09	The images are just sRGB — in the previous test I did the images happened to have a Rec709 ICC color profile so that may have caused much of the decode time to have been spent in skcms or something, I dunno. I figured using just sRGB would be fairer, but now 8-bit 420 avif looks much better and any other avif looks MUCH worse.
2022-12-17 11:02:11	Can someone else reproduce that huge gap between 4:2:0 avif and 4:4:4 avif?

Eugene Vert

	_wb_ I really wonder what is going on with those decode speeds at https://sneyers.info/browserspeedtest/index2.html
2022-12-17 11:02:57	I'm getting `Uncaught (in promise) DOMException: An attempt was made to use an object that is not, or is no longer, usable` on jxl buttons in firefox-dev

_wb_

2022-12-17 11:03:08	this looks like a performance bug in the chrome integration of avif, tbh
	Eugene Vert I'm getting `Uncaught (in promise) DOMException: An attempt was made to use an object that is not, or is no longer, usable` on jxl buttons in firefox-dev
2022-12-17 11:03:46	are you using a browser that can decode jxl? the current chrome canary no longer can, unfortunately

Eugene Vert

2022-12-17 11:04:38

Ah, thats non-wasm version

paperboyo

2022-12-17 11:04:38

_wb_

2022-12-17 11:06:29	ouch, with those number basically avif is in practice 4:2:0 only, that's a huge performance difference
2022-12-17 11:07:19	i'm getting a webp déjà vu

veluca

	_wb_ Why would you use a 32-bit build of Chrome on a 64-bit Chromebook? That just seems like a bad idea. Am I missing something?
2022-12-17 11:07:41	memory usage I guess? IDK
	paperboyo Pixel 6 Pro, Chrome Beta 109.0.5414,44: ``` 011.jpg: Decode speed: 61.65 MP/s \| Fetch: 1048.90ms \| 100 decodes: 531.50ms 011.jpg.jxl: Decode speed: 27.12 MP/s \| Fetch: 1238.30ms \| 100 decodes: 1208.40ms 011.jxl: Decode speed: 15.12 MP/s \| Fetch: 19.10ms \| 100 decodes: 2166.70ms 011-fd1.jxl: Decode speed: 15.43 MP/s \| Fetch: 1224.50ms \| 100 decodes: 2124.00ms 011-fd2.jxl: Decode speed: 17.45 MP/s \| Fetch: 1898.20ms \| 100 decodes: 1878.20ms 011-fd3.jxl: Decode speed: 19.10 MP/s \| Fetch: 2004.30ms \| 100 decodes: 1715.70ms 011-8bit420.avif: Decode speed: 38.04 MP/s \| Fetch: 15.20ms \| 100 decodes: 861.40ms 011-8bit444.avif: Decode speed: 1.83 MP/s \| Fetch: 2404.60ms \| 100 decodes: 17917.80ms 011-10bit.avif: Decode speed: 1.80 MP/s \| Fetch: 2536.10ms \| 100 decodes: 18231.50ms 011-12bit.avif: Decode speed: 1.79 MP/s \| Fetch: 2573.20ms \| 100 decodes: 18338.50ms 011.webp: Decode speed: 56.22 MP/s \| Fetch: 2970.80ms \| 100 decodes: 582.90ms ```
2022-12-17 11:07:55	that seems like a bug

_wb_

2022-12-17 11:07:55

this must be a bug though, no way there is an inherent decode speed difference that large

veluca

2022-12-17 11:08:18

that, or somebody didn't bother SIMDfying the 444 path yet, only other explanation I can think of

_wb_

2022-12-17 11:08:49

well it's what happens in current chrome stable and in chrome canary on both x64 and arm64, it seems

veluca

2022-12-17 11:08:53

but on the Samsung A51 I didn't see such a bif difference

_wb_

2022-12-17 11:09:09	I also didn't see that big of a difference on a different image
2022-12-17 11:09:21	maybe it's because this one doesn't have an icc profile?

veluca

2022-12-17 11:09:54	mhhh it also seems to happen on my laptop
2022-12-17 11:09:58	this is odd
2022-12-17 11:10:53	I feel like I should be taking a profile to figure out what the heck is happening

_wb_

2022-12-17 11:11:50	just using avifdec I don't see much of a difference in decode speed (444 just slightly slower, as expected), so this must be something specific to the chrome integration
2022-12-17 11:14:26	having different code paths for different bit depths / chroma subsampling modes in the chrome integration is error-prone, and this is another illustration of "we get it for free since we have av1 anyway" not being fully true...

Sauerstoffdioxid

2022-12-17 11:15:51

On a different note, I tried getting some benchmark numbers by throwing 500 `img` elements onto a page (resized to 1x1 so they actually all display and decode at the same time). Current Chromium stable: s.jpg: Fetching: 3.70 | Generating: 274.70 | Loading: 479.30 s.webp: Fetching: 4.90 | Generating: 285.40 | Loading: 885.20 s.jxl: Fetching: 4.90 | Generating: 270.40 | Loading: 3340.90 s8.avif: Fetching: 4.50 | Generating: 265.20 | Loading: 2261.60 s10.avif: Fetching: 5.10 | Generating: 316.90 | Loading: 3500.40 s12.avif: Fetching: 3.70 | Generating: 277.70 | Loading: 3349.70 so basically, as long as you don't force single threading or the ImageDecoder API, JXL performs just as well as AVIF.

_wb_

2022-12-17 11:19:15

does it decode the image 500 times if it's always the same image? that looks like a missed opportunity to optimize, since pages where the same image is being used multiple times are probably not that rare...

Sauerstoffdioxid

2022-12-17 11:23:37

Yeah, normally it caches, but I worked around that for this test

HLBG007

2022-12-17 11:32:43

Hello one graphic in your article is wrong. Here my version

jjido

2022-12-17 02:06:06

Trolling?

_wb_

2022-12-17 02:48:38

I wish avifdec had an option like djxl's --num_reps. Then we could do some more accurate timing of the actual decode time, and not just how optimized the current chrome integration is

Demiurge

2022-12-18 12:18:46	Hmm, for some reason putting a bunch of JPEGs in a .7z archive crushes them farther than JXL recompression?
2022-12-18 12:19:09	As long as there's not just 1
2022-12-18 12:19:38	I didn't realize LZMA can crush JPEGs so effectively
2022-12-18 12:20:06	That's kinda funny that such a generic method is more effective than a specialized method

ayumi

2022-12-18 12:22:01

If you are using "solid" compression it will compress multiple files as one "block", which could explain why you need more than one file to get this effect.

Demiurge

2022-12-18 12:22:06

I'm curious if anyone else has had a similar experience

Info

JPEG XL

General chat

Voice Channels

Archived

benchmarks