JPEG XL

2022-03-02 06:11:46	https://www.linkedin.com/posts/chafey_jpeg-xl-vs-htj2k-progressive-lossless-decoding-activity-6904809034528935936-h9NM
2022-03-02 06:12:14	<@597478497367359500> <@179701849576833024> how does compression density compare?
2022-03-02 06:12:48	If I understand correctly, htj2k does not have room for encoder improvements in density, or does it?

veluca

2022-03-02 06:12:50

I'd say mostly equivalent

chafey

2022-03-02 08:17:31	Can you explain what you mean by density?
2022-03-02 08:18:59	I probably won’t be able to answer as I am more of a user than a codec developer myself

BlueSwordM

	chafey I probably won’t be able to answer as I am more of a user than a codec developer myself
2022-03-02 08:45:17	Compression efficiency.

Traneptora

2022-03-03 05:43:43

usually measured in bpp, or bits per pixel

Orum

2022-04-18 11:50:52	An analysis of 341 lossless UHD video game screenshots, compressed losslessly in both jxl and webp
2022-04-18 11:52:34	tl;dr version: jxl is much better at using more cores than webp cjxl/djxl still use a lot more RAM than their webp counterparts djxl (when decompressing jxls compressed with `-e 3` and greater) is significantly slower than dwebp cjxl is not worth it over cwebp for compression ratio vs time spent encoding until you get to fairly slow settings (`-e 5` or higher)

_wb_

2022-04-18 12:47:51

There's room for improvement, there's a big region between fast lossless jxl (the experimental fjxl encoder) and libjxl lossless encoding. Libjxl is quite generic, doing very high bitdepth, arbitrary channel count, while cwebp is hardcoded for 8-bit rgba since the format cannot do anything else anyway.

Orum

2022-04-18 12:52:25

true, but it'd be nice if it didn't lag behind so much when doing 8-bit RGB(A)

_wb_

2022-04-18 01:44:11

It's a matter of implementing specialized encode/decode paths, not a high priority atm but eventually we'll get there

WoofinaS

2022-04-25 09:06:05

Something something, i only tested 1 image. Source image and file size over max distance attached. Over 90 probes to try and get the graph to be not a funky, but i failed. Side note, i encoded from .1 to 1 using steps of .1. ill have to do this with a new intensity target and output to 16 bit png

Orum

2022-04-25 10:03:44	what are the axis even...
2022-04-25 10:04:18	distance and file size?

WoofinaS

	Orum distance and file size?
2022-04-25 11:49:20	Did I not say that?
2022-04-25 11:50:04	Anyways i fucked up and forgot about banding from djxl and target intensity in butter.

monad

2022-04-28 09:07:20

3408B WebP

Romao

2022-04-28 03:00:53	hello, everyone what preset should I use to make jxl compete with jbig2 for lossless/near lossleess bitonal/bi-level images? I was testing out the deprecated flif, and it seems it does implement something similar to jbig2 because the final size is always pretty close to say, djvulibre's cjb2 but I can't really get the same small sizes for these halftone images using jxl
2022-04-28 03:02:26	is <#840831132009365514> a better channel to post this? <:kekw:808717074305122316>

_wb_

2022-04-28 03:41:03	Bilevel is not something we've really looked at well yet from the encode heuristics perspective. We can do more than flif (e.g. we can do patches), but flif has the advantage of adaptive chance updates which means it can get away with worse context modeling and still get good compression (at the cost of slower decode).
2022-04-28 03:42:29	The current patch heuristics are probably not working well on bilevel images though
2022-04-28 03:44:10	Out of curiosity: what's the use case you have for bilevel images? (as opposed to just grayscale images and letting printers deal with halftoning)

Romao

	_wb_ Out of curiosity: what's the use case you have for bilevel images? (as opposed to just grayscale images and letting printers deal with halftoning)
2022-04-28 05:18:14	most mangas are in principle halftones, so when you scan them at really high dpi and clean them properly, you can get a really high amount of detail at small sizes encoding them bilevel

lithium

2022-04-28 05:25:56

manga content 🙂 I also plan use libjxl compress my manga content image.

_wb_

2022-04-28 07:07:36

I see. Is there no better alternative than scanning printed images? Because I assume the art itself is created in grayscale and then gets halftoned when printing, no?

improver

2022-04-28 07:58:30	"it depends"
2022-04-28 07:59:14	in general a whole lot of stuff gets actually created in bilevel because that allows putting nicer textures than whatever printer would put out
2022-04-28 08:05:13	a whole lot of stuff seems like the same dot pattern tho so theres a lot of grayscale too

_wb_

2022-04-28 08:06:16

Iirc jbig2 has a kind of halftoning detection where it encodes the image basically as grayscale + dot patterns

improver

2022-04-28 08:06:31

but, like, sometimes physical manga is max resolution you'll get, because whatever gets put out online is usually optimized for online viewing

_wb_

2022-04-28 08:06:45	Right
2022-04-28 08:08:28	Well I guess we could try to do something better for bilevel images, I think the current patch heuristics are pretty much garbage for halftone patterns

improver

2022-04-28 08:10:13

i dont think patches would even be the right approach, unless for wider areas

_wb_

2022-04-28 08:10:24

Possibly encoding it at higher bitdepth and with squeeze could be effective to do something close to jbig2 halftone detection

improver

2022-04-28 08:10:52	yeah i was thinking of doing prediction magix too
2022-04-28 08:11:27	because in these areas its really predictable but then these get cut with solid black lines

_wb_

2022-04-28 08:11:48

Basically the squeezed image would become grayscale and the residuals would maybe be repetitive enough if the halftone patterns are regular enough

improver

2022-04-28 08:12:34	would need really right resolution matching for downsampled image to actually become grayscale
2022-04-28 08:15:11	also idk why but squeeze is kinda inefficient compared to full lossless
2022-04-28 08:15:44	& there are a lot of fine details in mangas, finer than whatever dot patterns are
2022-04-28 08:16:49	make me appreciate having some physical mangas, allows inspecting stuff up real close

Romao

	_wb_ Well I guess we could try to do something better for bilevel images, I think the current patch heuristics are pretty much garbage for halftone patterns
2022-04-28 08:20:59	that would be awesome

_wb_

2022-04-28 09:00:13

could you share a test image?

Romao

2022-04-28 11:06:38	Sure, I'll share when I get home
2022-04-29 09:55:16
2022-04-29 09:57:09
2022-04-29 09:58:04	<@794205442175402004> here's the png and the two encode tests I did with jbig2 and flif
2022-04-29 09:59:15	I don't have enough memory to use cjxl at the best lossless settings on this image, but I remember getting worse results for smaller ones

lithium

2022-04-29 11:02:14

I can confirm lossy `palette prediction` really can help for this type manga content, probably can implement something similar `palette prediction` or better method in `delta palette`?, and combine in `Jon flexible algorithm`(`VarDCT` + `modular` + `modular patch`), but I don't know this idea is conform spec and bitstream?

_wb_

2022-04-29 01:24:37

There is a lot of encoder freedom in the bitstream and but making an encoder that makes optimal use of that freedom is not easy.

Romao

	lithium I can confirm lossy `palette prediction` really can help for this type manga content, probably can implement something similar `palette prediction` or better method in `delta palette`?, and combine in `Jon flexible algorithm`(`VarDCT` + `modular` + `modular patch`), but I don't know this idea is conform spec and bitstream?
2022-04-29 03:04:06	> I can confirm lossy palette prediction really can help for this type manga content you mean to detect bitonal images if it's not in the metadata already? 🤔 or is this another whole thing I know nothing about? lol

lithium

	Romao > I can confirm lossy palette prediction really can help for this type manga content you mean to detect bitonal images if it's not in the metadata already? 🤔 or is this another whole thing I know nothing about? lol
2022-04-29 04:20:33	`palette prediction` is a encoder features from `av1 codec`, this is a good lossy features, if your input content have less colors and don't need mathematically lossless, but I believe `libjxl` will have better features implement for non-photo content in near future. 🙂

Orum

2022-05-04 06:29:54

better graphs of the earlier data I posted:

_wb_

	lithium I can confirm lossy `palette prediction` really can help for this type manga content, probably can implement something similar `palette prediction` or better method in `delta palette`?, and combine in `Jon flexible algorithm`(`VarDCT` + `modular` + `modular patch`), but I don't know this idea is conform spec and bitstream?
2022-05-05 07:27:05	I'm experimenting again with new patch heuristics to use more modular patches (currently they are only used when there is near-exact duplication, which is common in screenshots but not so much for manga, except maybe for non-handwritten text)
2022-05-05 07:29:22	today for the first time I am getting actual improvements in bpp*pnorm, sometimes quite significant improvements
2022-05-05 07:29:26	e.g. on this image
2022-05-05 07:29:35	before: ``` 07plc7gfj0l51.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 2214 432125 1.5612016 2.117 36.207 1.11358273 0.31890432 0.497873942247 0 jxl 2214 345092 1.2467647 2.114 30.862 1.88846326 0.48006211 0.598524496051 0 jxl:d2 2214 263183 0.9508400 2.080 27.166 2.73356056 0.79684273 0.757669928818 0 jxl:d3 2214 218374 0.7889519 2.076 24.135 4.01491070 1.18094983 0.931712623696 0 Aggregate: 2214 304264 1.0992599 2.097 29.257 2.19184028 0.61608477 0.677237252667 0 ```
2022-05-05 07:30:03	after: ``` 07plc7gfj0l51.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 2214 346979 1.2535821 0.122 10.713 1.24547791 0.31311020 0.392509355893 0 jxl 2214 289491 1.0458868 0.122 10.489 1.88895488 0.43218070 0.452012075776 0 jxl:d2 2214 232918 0.8414972 0.121 9.158 2.70391750 0.71183978 0.599011154041 0 jxl:d3 2214 201819 0.7291412 0.122 10.350 3.95498776 1.02561728 0.747819843234 0 Aggregate: 2214 262136 0.9470555 0.122 10.159 2.23961851 0.56063801 0.530955331657 0 ```
2022-05-05 07:31:21	that's a significant density improvement, 345 KB -> 289 KB at d1, while BA pnorm actually slightly improves
2022-05-05 07:32:46	for most images the difference is not that big since I'm keeping the heuristic quite conservative and careful
2022-05-05 07:35:51	the above example is quite extreme, where it's doing half the image with modular
2022-05-05 07:35:54	1132x1132 patch frame for a 1248x1784 image (57.555046 %)
2022-05-05 07:36:43	(but some of that patch frame size comes from padding for bin packing, the actual amount of patches is a bit under 50%, still a lot of course)
2022-05-05 07:37:18	a more typical example of what happens on manga images is like this:
2022-05-05 07:37:20	before ``` 08lgid8dzmm51.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 4312 643824 1.1943313 2.184 24.714 1.02787471 0.30521759 0.364530923964 0 jxl 4312 439132 0.8146156 2.237 22.915 1.48141503 0.55074892 0.448648685151 0 jxl:d2 4312 302464 0.5610885 2.244 20.186 2.42477965 0.71395466 0.400591730492 0 jxl:d3 4312 241008 0.4470840 2.250 21.788 3.67580104 1.08050354 0.483075829340 0 Aggregate: 4312 378893 0.7028692 2.229 22.340 1.91937706 0.60008809 0.421783431626 0 ```
2022-05-05 07:37:46	after 456x456 patch frame for a 1320x3280 image (4.802661 %) ``` 08lgid8dzmm51.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 4312 611385 1.1341551 1.807 25.850 1.02787471 0.30409180 0.344887259742 0 jxl 4312 418490 0.7763235 1.835 25.557 1.48141503 0.54973979 0.426775924101 0 jxl:d2 4312 292236 0.5421149 1.830 23.615 2.42477965 0.70987892 0.384835966095 0 jxl:d3 4312 234459 0.4349352 1.839 25.559 3.71127295 1.08847549 0.473416310914 0 Aggregate: 4312 363873 0.6750062 1.828 25.129 1.92399095 0.59950279 0.404668112215 0 ```
2022-05-05 07:39:13	still a nice 4-5% or so compression improvement for that image
2022-05-05 07:43:11	it's not always that effective yet, and sometimes it makes things worse, so there's still some tweaking to be done
2022-05-05 07:43:47	but at least it has reached a point where it's doing something useful sometimes, which is a good start 🙂
2022-05-05 08:03:21	<@179701849576833024> sometimes when there is a huge area of patches, the bin packing is struggling a bit, needing multiple iterations before finally managing to place all patches: ``` trying a 714x966 patch frame for a 784x1096 image (80.269157 %) trying a 750x1015 patch frame for a 784x1096 image (88.593262 %) trying a 788x1066 patch frame for a 784x1096 image (97.759010 %) trying a 828x1120 patch frame for a 784x1096 image (107.924919 %) trying a 870x1177 patch frame for a 784x1096 image (119.170593 %) trying a 914x1236 patch frame for a 784x1096 image (131.473434 %) ```
2022-05-05 08:03:53	even leading to a patch frame larger than the image itself, even though of course all patches fit in the image because they come from the image 🙂

veluca

2022-05-05 08:04:11

are they overlapping?

_wb_

2022-05-05 08:04:14

veluca

2022-05-05 08:04:30

heh that's heuristics for you I guess

_wb_

2022-05-05 08:04:44

but they can get large and have arbitrary aspect ratios

veluca

2022-05-05 08:04:48	it is a pretty silly heuristic
2022-05-05 08:04:59	(also tuned for relatively tiny patch surface areas)

_wb_

2022-05-05 08:05:05	anyway, even with that oversized patch frame, it's still good for compression:
2022-05-05 08:05:07	before ``` 7p9rd4564mj41.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 854 101177 0.9468183 2.082 37.894 0.91804177 0.28067457 0.265747811142 0 jxl 854 80316 0.7516002 2.098 28.186 1.86959481 0.43766729 0.328950832638 0 jxl:d2 854 62157 0.5816676 1.991 23.110 2.50284338 0.71627437 0.416633593168 0 jxl:d3 854 51737 0.4841568 1.855 21.474 3.74193382 1.03577341 0.501476783252 0 Aggregate: 854 71498 0.6690809 2.004 26.982 2.00232665 0.54944324 0.367621963585 0 ```
2022-05-05 08:05:21	after ``` 7p9rd4564mj41.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------ jxl:d0.5 854 90269 0.8447408 0.738 7.906 0.95852184 0.25555437 0.215877200591 0 jxl 854 74333 0.6956111 0.808 7.471 1.86959481 0.41174833 0.286416699146 0 jxl:d2 854 59437 0.5562137 0.840 7.786 2.32554865 0.64502547 0.358772027415 0 jxl:d3 854 51362 0.4806476 0.847 7.989 3.13378978 0.93008857 0.447044817466 0 Aggregate: 854 67275 0.6295636 0.807 7.786 1.90101787 0.50124965 0.315568530950 0 ```
2022-05-05 08:05:59	speed wasn't measured accurately but I can imagine that decode speed does indeed get a serious blow when 80% of the image is patches that come from an oversized patch frame
2022-05-05 08:06:40	probably in cases like that it should just encode the whole image with modular and forget about patches

veluca

2022-05-05 08:07:39

anyway that bin packing heuristic was written in like 2 hours, I'm sure one could do better

_wb_

2022-05-05 08:08:27	well at least you can always avoid going larger than the image if you know the patches come from the image and are not overlapping
2022-05-05 08:27:53	also I could limit dimensions of patches, I currently just let them grow as large as possible, but that can lead to very wide or tall patches and those will probably be hard to play tetris with

Orum

2022-05-06 03:19:18	some lossy JXL tests, measured with butteraugli (requested distances were all 0.25, 0.5, 1, or 2, and then jittered on the X axis for the one graph with raw data)
2022-05-06 03:28:26	big efficiency jump from e7 -> e8, others (e4 -> e5, e5 -> e6, e6 -> e7) not so much
2022-05-06 03:30:58	also a big encoding time jump though <:SadOrange:806131742636507177>

_wb_

2022-05-06 05:04:31

The thing is, only at e8+ does the encoder really optimize for BA. When measuring quality with the metric you also optimize for, things will look better than they actually are subjectively.

Orum

2022-05-06 05:53:04	Oh definitely. That said, I can see the difference in several images, though typically I'm either pixel peeping or looking at the higher distances
2022-05-06 06:04:38	still the biggest help IMHO is a high intensity target, much more so than raising effort

lithium

	_wb_ I'm experimenting again with new patch heuristics to use more modular patches (currently they are only used when there is near-exact duplication, which is common in screenshots but not so much for manga, except maybe for non-handwritten text)
2022-05-06 06:24:39	Thank you for your work, I'm really look forward to using this heuristic. 🙂

perk

2022-05-06 10:33:13

I have a lot of electron microscopy images that are old data and already published (so no licensing issues), would you be interested in me uploading them as a dataset? This kind of thing: https://cdn.discordapp.com/attachments/409215168753434624/972082632122839090/Sample_5_a2_200kx-2.tif

_wb_

2022-05-06 10:53:56	could be interesting, it's of course a bit of a niche use case but considering that we want jxl to be a widely applicable as possible, it doesn't hurt to see what can be done on images like that
2022-05-06 10:54:47	would you say lossless is most important in these use cases, or is lossy also acceptable as long as artifacts are small?
2022-05-06 10:56:09	that example image looks quite noisy, which I assume is kind of inherent to the capture technique

perk

2022-05-06 11:53:23

It depends on the use case. With these images, lossy is fine, as the only data we're looking at is the size of the particles, which isn't really affected by noise. The applications that require lossless typically use so much data that they record movies instead. Some people may require lossless anyway for archival and transparency.

_wb_

2022-05-06 02:16:59
2022-05-06 02:17:34	that's on a corpus of 345 manga images
2022-05-06 02:41:56	so at d1, the image looks about 4% better on average according to BA pnorm, while the file also gets 1.45% smaller on average
2022-05-06 02:42:47	but it comes at a cost in slower encode and decode
2022-05-06 02:49:41	for 77 images, nothing changed except encode time (the heuristic decided to not add any new patches_ for 268 images, something did change. For just those, at d1 average pnorm is 5% better while average file size is 1.9% smaller, for a bpp*pnorm improvement of about 6%

fab

2022-05-06 02:59:23	I would like two times more accuracy at d 0.347 s 8 as your commit
2022-05-06 02:59:31	YouTube screenshots

_wb_

2022-05-06 03:04:16

This is what I get on a random YouTube screenshot: ``` Screen_Shot_2021-08-18_at_11.35.13.png Encoding kPixels Bytes BPP E MP/s D MP/s Max norm pnorm BPP*pnorm Bugs ------------------------------------------------------------------------------------------------------------------------------ jxl:d0.5:no_more_patches 5606 803100 1.1458951 1.828 21.837 0.82998317 0.27371135 0.313644495515 0 jxl:d0.5:more_patches 5606 785944 1.1214162 1.311 18.107 0.82998317 0.27352729 0.306737943323 0 jxl:d1:no_more_patches 5606 558617 0.7970570 1.814 20.476 1.48253393 0.46986383 0.374508258239 0 jxl:d1:more_patches 5606 547571 0.7812961 1.326 16.961 1.48253393 0.46941556 0.366752557220 0 jxl:d2:no_more_patches 5606 380037 0.5422519 1.724 24.533 2.26952386 0.78028558 0.423111369728 0 jxl:d2:more_patches 5606 374373 0.5341703 1.253 19.115 2.26952386 0.77939242 0.416328297617 0 jxl:d3:no_more_patches 5606 296384 0.4228925 1.826 23.784 3.34302258 1.04765996 0.443047544042 0 jxl:d3:more_patches 5606 292265 0.4170154 1.278 19.817 3.44766951 1.04667009 0.436477495814 0 jxl:d0.347:8:no_more_patches 5606 1042183 1.4870282 0.458 27.881 0.49486309 0.19363451 0.287939982367 0 jxl:d0.347:8:more_patches 5606 1020392 1.4559360 0.430 27.743 0.53180611 0.19382646 0.282198919179 0 ```

2022-05-06 03:05:41

that's about 1.5% improvement at d0.347 s8

lithium

2022-05-06 04:22:18

Heuristic apply which lossy algorithm on patches? XYB quantization or delta palette?

_wb_

2022-05-06 04:28:39	xyb quantization, there currently is no xyb implementation for delta palette yet, and cannot mix xyb and non-xyb
2022-05-06 04:29:46	but basically the quantization error of the modular patch does still remain in the residual image that gets vardct encoded, so if vardct doesn't quantize it away, there's a chance that it gets corrected

lithium

	_wb_ but basically the quantization error of the modular patch does still remain in the residual image that gets vardct encoded, so if vardct doesn't quantize it away, there's a chance that it gets corrected
2022-05-06 04:39:22	I understand, thank you for your reply, 🙂 a little curious, this heuristic also can help sharp edge and line features on VarDCT mode, like `nonphoto_separation` heuristic branch?

_wb_

2022-05-06 04:53:28	Yes, the idea is the same as nonphoto_separation, only this does it with patches instead of frame blending
2022-05-25 09:34:41	Here are some preliminary results of a large-scale subjective experiment we did at Cloudinary
2022-05-25 09:35:37	involving 40000 test subjects or so
2022-05-25 09:40:45	250 original images, encoded with 7 different encoders: mozjpeg, cwebp, aom, heic (x265), aurora (avif), kakadu (j2k), libjxl between 8 and 11 encode settings per encoder (11 for jpeg, avif and jxl), ranging from q30 to q95
2022-05-25 09:56:26	here is just jxl and aom

ziemek.z

	_wb_ here is just jxl and aom
2022-05-25 11:29:52	From what I can see, at a given BPP <:JXL:805850130203934781> consistently achieves some visual quality. AVIF not only requires more BPP to maintain the same average quality, but also isn't as consistent in doing that (the quality varies a bit more).

_wb_

2022-05-25 11:37:40	Yeah, also compared to other codecs like heic and webp, it looks like jxl is most consistent
2022-05-25 11:41:19	Roughly speaking, I'd rank the encoders like this, from most consistent to least consistent: jxl mozjpeg heic j2k webp avif aom avif aurora

fab

2022-05-25 12:40:42	Version with sse4 and or newer heuristics
2022-05-25 12:40:55	Or stock

Cool Doggo

	_wb_ 250 original images, encoded with 7 different encoders: mozjpeg, cwebp, aom, heic (x265), aurora (avif), kakadu (j2k), libjxl between 8 and 11 encode settings per encoder (11 for jpeg, avif and jxl), ranging from q30 to q95
2022-05-25 12:55:37	what type of images were used? photographic?

fab

2022-05-25 01:02:36

Webp2 is like avif of 12 month ago

_wb_

	Cool Doggo what type of images were used? photographic?
2022-05-25 01:10:26	A mix, mostly photo in various categories, also a few nonphoto categories
	_wb_ these are all the originals (well, jpeg-compressed here for discord)
2022-05-25 01:19:40	This is the set of images we used for testing

Orum

	_wb_ 250 original images, encoded with 7 different encoders: mozjpeg, cwebp, aom, heic (x265), aurora (avif), kakadu (j2k), libjxl between 8 and 11 encode settings per encoder (11 for jpeg, avif and jxl), ranging from q30 to q95
2022-05-25 02:57:12	You have the full settings used when encoding?
2022-05-25 02:58:32	libaom is quite bad by default (especially if you use avifenc)
2022-05-25 03:04:14	interesting that you have access to aurora though <:PepeGlasses:878298516965982308>

_wb_

2022-05-25 03:21:24	Yeah we didn't use default aom, but probably also not best settings
	_wb_ Here are some preliminary results of a large-scale subjective experiment we did at Cloudinary
2022-05-26 08:11:18	These results look quite good for jxl imo. Even at the lower bitrates, where I would expect jxl to perform worse, it still does quite well. At the lower quality settings we tested, say d>3, I'd say heic > jxl > avif > webp > j2k > jpeg. In the d<3 range, I'd say jxl > avif > heic > webp > j2k > jpeg.

Fraetor

2022-05-26 11:46:32

I find those violin plots really hard to interpret, especially with the qualities all jumbled up on the side. What should I be looking for?

yurume

2022-05-26 12:12:16	the consistency wb is referring to relates to the relative width of each violin shape, where jxl's ones are much narrower (= much more consistent quality)
2022-05-26 12:13:55	boxes and dots and crosses are related to the "worst case" evaluation, which is important if you don't mind better quality for given settings but you do mind worse quality a lot
2022-05-26 12:16:30	but both jxl and avif densely fills that space, meaning that a higher quality setting will actually convert to the real quality improvement and you have enough freedom in choosing that setting
2022-05-26 12:17:09	so the question changes to the relative variation of quality for given quality setting, which is in the other words consistency
2022-05-26 12:17:25	at least that's how I interpret those plots

_wb_

2022-05-26 12:38:59	Yes, that's right.
2022-05-26 12:43:00	The X marks the p10 percentile, i.e. if you would take all 250 images we encoded with that encode setting, and order them from worst DMOS score to highest DMOS score, 25 images are worse than that, and 225 images are better
2022-05-26 12:44:28	I think this is the first study that demonstrates that libjxl is achieving better consistency than others.
2022-05-26 12:45:13	Also looking at the results in general, it looks like jxl is more consistent than mozjpeg, and mozjpeg is more consistent than any of the other codecs (webp, avif, heic, j2k)
2022-05-26 12:47:05	Bad consistency basically means you need to manually select the encoder quality setting, you cannot just "set it and forget it" because that will sometimes give a crappy image and sometimes give an image that looks perfect but you could have compressed it more and it would still be fine
2022-05-26 12:48:16	Manually selecting settings is a problem in automated workflows like e.g. anything Cloudinary does (but also facebook, twitter, etc)
2022-05-30 03:42:51
2022-05-30 03:54:09
2022-05-30 03:56:10	quite interesting how jxl kills the competition for nature and abstract images, while for other categories it's not that clear
2022-05-30 04:16:19	you can also quite clearly see the "meh" character of jpeg 2000 and webp, both _kind of_ but not really beating jpeg
2022-05-30 04:24:49	those highest three jxl points are cjxl -q 85, -q 90 and -q 95, i.e. d1.45, d1 and d0.55. It's quite cool to see how jxl q85 consistently reaches very high quality, not just "typically" but in every category and even if you look at the p10, not the median/average case

monad

2022-05-30 08:01:20

What might explain the relationship between q85 and q90?

_wb_

2022-05-30 08:10:02	probably more an artifact of how the DMOS scores were estimated. For jxl q85 we have direct MOS scores, close to 100 opinions per distorted image. For jxl q90 we don't, but we do have relative information (how it compared to other distorted images in pairwise comparisons). Its DMOS score was interpolated from nearest known MOS scores, which e.g. could be the jxl q85 and the mozjpeg q90. But probably I still need to tune a bit how exactly this is done.
2022-05-30 08:15:03	Things do get tricky at the high end. It does happen that mozjpeg q90 gets a higher MOS score than the original image, for example.
2022-05-30 08:16:08	Even after screening for bad participants, the original image tends to get a MOS score between 8.5 and 9.2 on a scale from 0 to 10 where 10 is perfect.

yurume

2022-05-30 08:17:23

I found a criticism from Jake Archibald (from twitter) interesting, which is by the way quite a nihilistic take on codec quality...

_wb_

2022-05-30 08:17:29	That's a problem with MOS scores: when people don't see any difference, they'll still give an 8 or 9 "just to be sure" ("I may have missed something")
2022-05-30 08:18:44	<@710762823986446367> (Jake) is also here btw

yurume

2022-05-30 08:22:54

I found it nihilistic because it essentially means that as devices move to 2x or more pixel density the needs for accurately encoding detail decrease, and therefore we would be better "faking" details in the future by upsampling or whatnot

_wb_

2022-05-30 08:24:42

Why do phones have 20 megapixel cameras when they have only a 2 or maybe 8 megapixel screen?

yurume

2022-05-30 08:25:14

because 20MP figure is faked? 😛

BlueSwordM

	_wb_ Why do phones have 20 megapixel cameras when they have only a 2 or maybe 8 megapixel screen?
2022-05-30 08:25:22	Supersampling and zooming in after the fact, and post-processing is less harmful. Electronic Image Stabilization is also a factor.

_wb_

2022-05-30 08:26:26

It was a rhetorical question but ok :)

yurume

2022-05-30 08:26:31

haha

BlueSwordM

2022-05-30 08:26:38	lmaooo
2022-05-30 08:26:40	<:kekw:808717074305122316>
2022-05-30 08:26:55	Man, I wish replies did not ping automatically by default.
2022-05-30 08:27:26	However, I have the personal belief that with smartphone lenses, the highest MP count should be 40MP, and more optimally, 32MP.

yurume

2022-05-30 08:27:58

but yeah, today we are still likely to view images taken from phones (which might _not_ have 20M nominal pixels, but anyway) in desktops, I'm not sure that would hold in say, 2040

BlueSwordM

	yurume but yeah, today we are still likely to view images taken from phones (which might _not_ have 20M nominal pixels, but anyway) in desktops, I'm not sure that would hold in say, 2040
2022-05-30 08:28:29	Oh, don't worry, they do have that pixel amount.
2022-05-30 08:28:42	It's not even fake: it's just that optical bottlenecks are quite the limitation.

yurume

2022-05-30 08:29:02

isn't that 20M _sensors_ that make 20M pixels via postprocessing?

BlueSwordM

	yurume isn't that 20M _sensors_ that make 20M pixels via postprocessing?
2022-05-30 08:30:16	No. 20MP phone sensors are true 20MP.

spider-mario

2022-05-30 08:30:28

20MP sensors have the potential to produce better 8MP images than native 8MP sensors

BlueSwordM

2022-05-30 08:30:49

Same thing for 48MP or even recent 108MP sensors: it's just that depending on what you want, if you're not limited by low-light performance, then the higher megapixel count is more favorable in some edge scenarios.

spider-mario

2022-05-30 08:32:23	also, the disadvantages of higher-resolution sensors in low light are often exaggerated by people who compare the amount of noise per pixel
2022-05-30 08:32:58	while each individual pixel in a 20MP sensor will collect less light than the pixels of an 8MP sensor, one must not forget that, of course, there are more of them
2022-05-30 08:33:17	such that in a 8000000th of the sensor, basically as much light is captured (just shared among more pixels)

BlueSwordM

2022-05-30 08:33:21

Higher megapixels with pixel binning do cause multiple problems: 1. Optical limitations(lenses). Current phone lenses can't resolve anything above 24MP with good accuracy. 2. Lower light performance. 3. Processing time. 4. Yields. 5. Pixel binning has its disadvantages by itself: you get more "pollution" and higher sensor read time.

spider-mario

2022-05-30 08:33:36

again, 1 is only a problem if you compare per pixel

yurume

2022-05-30 08:33:45

yeah, as long as we know the noise characteristic it is of less concern (at least, that was my understanding)

spider-mario

2022-05-30 08:34:14	it’s not a disadvantage, just “not as much of an advantage as it could be”
2022-05-30 08:35:35	see also the appendix from https://www.lensrentals.com/blog/2019/10/more-ultra-high-resolution-mtf-experiments/#:~:text=Appendix%3A%20Why%20Perceptual%20Megapixels%C2%A0are%20Stupid

BlueSwordM

2022-05-30 08:36:03	Anyway, there is a direct benefit JXL would bring to picture taking: format dynamic range would not be an issue anymore 😛
2022-05-30 08:36:23	You could just plop down the raw image directly to the appropriate depth JXL without any special tricks required to fit stuff inside of JPEG.

Orum

2022-05-30 09:13:29	Well there is an issue that there's wasted space between pixels on a sensor, so 4 pixels in a 40 mp camera get less overall light than 1 in a 10 mp camera (assuming the same overall sensor dimensions, lens, f-stop, etc.). That has been somewhat mitigated by microlenses but they're still not perfect.
2022-05-30 09:14:03	Also at a certain point more pixels gets you no additional detail because you become diffraction limited. Modern high-res sensors require relatively fast lenses to make use of that resolution, and that limits your DoF options (though there are times where large DoF is more important than avoiding diffraction).

spider-mario

	Orum Well there is an issue that there's wasted space between pixels on a sensor, so 4 pixels in a 40 mp camera get less overall light than 1 in a 10 mp camera (assuming the same overall sensor dimensions, lens, f-stop, etc.). That has been somewhat mitigated by microlenses but they're still not perfect.
2022-05-30 09:33:23	indeed gapless microlenses but also backside illumination (or even just smaller electronics with front side illumination)
	Orum Also at a certain point more pixels gets you no additional detail because you become diffraction limited. Modern high-res sensors require relatively fast lenses to make use of that resolution, and that limits your DoF options (though there are times where large DoF is more important than avoiding diffraction).
2022-05-30 09:35:24

Orum

2022-05-30 09:38:46	yes I know how to calculate diffraction limits <:YEP:808828808127971399>
2022-05-30 09:39:21	they're more of a gradual degradation anyway than a hard limit

2022-05-30 09:46:01

I remember my art friend from elementary school always said 2mp is enough

spider-mario

2022-05-30 09:46:34	1920×1080 is 2MP
2022-05-30 09:46:38	(in fact slightly more)
2022-05-30 09:47:26	having ~recently tried an 8K monitor, it was a strange experience not to fill it with 20MP shown at 1:1 size
2022-05-30 09:47:51	but it was nice to see such sharpness

2022-05-30 09:48:40	I always came off it now thinking more resolution doesn't meaningfully add anything to a photo
2022-05-30 09:50:07	I think it made even more sense considering people view photos on tumblr and Instagram and they show the photos at some max resolution around 1mp

spider-mario

2022-05-30 09:50:29	1080×1350 for Instagram
2022-05-30 09:50:36	(yes, taller than wide)

2022-05-30 09:50:54

interesting it always looked square to me

spider-mario

2022-05-30 09:51:30	it is possible that most people don’t make use of the possibility
2022-05-30 09:51:53	on Facebook, the maximum size is 2048 for the long edge
2022-05-30 09:52:07	so a square image can be ~4.2 MP

190n

	BlueSwordM Man, I wish replies did not ping automatically by default.
2022-05-30 09:52:23	if you hold shift while clicking the reply button it'll turn off ping
2022-05-30 09:52:44	although it also rearranges the buttons lmao

spider-mario

2022-05-30 09:52:47	(but phone cameras are most often 4:3 so an uncropped phone photo would be resized by facebook to ~3 MP)
2022-05-30 09:55:13	on imgur and discord, the limit is in terms of file size (imgur: 5MB for authenticated users, discord: 8MB) rather than resolution
2022-05-30 09:55:55	so, for those, I have a script to compress images to the highest jpeg quality that yields a file under a given limit 😁

yurume

2022-05-30 09:56:25

make sure to _uncompress_ files smaller than the size limit as well 😄

ziemek.z

	BlueSwordM However, I have the personal belief that with smartphone lenses, the highest MP count should be 40MP, and more optimally, 32MP.
2022-05-31 05:47:29	41 MPix Lumia 1020 ❤️

_wb_

2022-05-31 03:17:24

paperboyo

	_wb_ Here are some preliminary results of a large-scale subjective experiment we did at Cloudinary
2022-05-31 03:43:49	Thank you for that! Is there a publication planned to describe and interpret the results?

_wb_

2022-05-31 03:48:15

probably, but it will likely take a while before we get to that, so I'm dumping plots here as I'm making them 🙂

BlueSwordM

2022-05-31 07:02:38

I have the impression there's something wrong in this benchmark: https://www.lossless-benchmarks.com/

_wb_

2022-05-31 08:08:00	Looks like it has quite a few things wrong
2022-05-31 08:08:19	Why is there only one point for libjxl? It has 9 speed settings...
2022-05-31 08:08:26	Why not test fjxl?
2022-05-31 08:09:38	Measuring average file size in bytes is a bit weird, but I guess OK if the images are similar enough.
2022-05-31 08:11:05	Ffv1 being densest (except for one corpus where png is densest) is suspicious
2022-05-31 08:12:04	Perhaps dropped alpha or accidentally encoded in yuv?

veluca

2022-05-31 08:25:38

jxl is fjxl there

_wb_

2022-05-31 08:26:23

Ah, so fjxl encode, libjxl decode?

veluca

2022-05-31 08:26:41

yup AFAIU

_wb_

2022-05-31 08:27:11	Misleading description they put there then
2022-05-31 08:27:26	How does fjxl not beat qoi in speed?
2022-05-31 08:28:06	Is it an fjxl compiled without avx2 maybe?
2022-05-31 08:29:58	Oh, it's Pierre and Osamu who made that page?
2022-05-31 08:32:48	<@553324745240608773> please compile fjxl with ` -mavx2 -DFASTLL_ENABLE_AVX2_INTRINSICS` so it uses the fast code path, not the scalar fallback
2022-05-31 08:34:02	It would also be interesting to see how the (of course way slower) libjxl encode does, say at e2 to e5 or so (slower than that probably gets out of scope)

Pierre-Anthony Lemieux

	_wb_ <@553324745240608773> please compile fjxl with ` -mavx2 -DFASTLL_ENABLE_AVX2_INTRINSICS` so it uses the fast code path, not the scalar fallback
2022-05-31 09:01:54	<@794205442175402004> What would you modify at https://github.com/sandflow/libench/blob/main/CMakeLists.txt ?

_wb_

2022-05-31 09:10:33

I don't know what the cmake syntax is for passing compiler flags but without those two it will not use any simd besides what autovec can do

Pierre-Anthony Lemieux

2022-05-31 09:24:33	libjxl uses CMake, so who should I ask?
2022-05-31 09:34:00	I would think this would be good information for anyone who plans to integrate libjxl in their project 🙂

_wb_

2022-05-31 09:35:55

well fjxl is not really part of libjxl atm, it's just experimental demo code

veluca

2022-05-31 10:36:10	IIRC it's target_compile_options
2022-05-31 10:37:23	add `target_compile_options(fast_lossless -mavx2 -DFASTLL_ENABLE_AVX2_INTRINSICS)` at line 54
2022-05-31 10:37:26	maybe try that?

Fraetor

	_wb_
2022-05-31 10:57:14	What sport is this?

Cool Doggo

2022-05-31 11:37:21

cricket <:YEP:808828808127971399>

Pierre-Anthony Lemieux

	veluca add `target_compile_options(fast_lossless -mavx2 -DFASTLL_ENABLE_AVX2_INTRINSICS)` at line 54
2022-06-01 12:16:43	Will try
2022-06-01 12:25:25	<@179701849576833024> Looks like encode times are 2x faster

BlueSwordM

	Pierre-Anthony Lemieux <@179701849576833024> Looks like encode times are 2x faster
2022-06-01 05:11:58	That is a rather significant speed difference.
2022-06-01 05:12:32	If this is consistent, then it makes it as fast/faster and better than QOI in the test sets.

_wb_

	Cool Doggo cricket <:YEP:808828808127971399>
2022-06-01 05:20:53	Looool! Yeah I guess this is what happens when you let AI tagging do the categorization...

monad

2022-06-01 05:58:45

Also, rocks and buildings are abstract, water is buildings, buildings are streets, outdoors is indoors, and photos of hands holding cards are non-photo.

_wb_

2022-06-01 06:01:51	Some of this is me grouping categories in larger categories for the plots
2022-06-01 06:02:43	"abstract" includes the category "abstract" but also "texture" and "geometric" iirc

monad

2022-06-01 06:28:27

It might be clarifying to include more of the constituent categories in the plot labels.

_wb_

2022-06-01 07:07:14	i'll take a look at it and manually clean up the categories a bit
2022-06-01 02:35:47
2022-06-01 02:40:09	manually partitioned the test set into 13 categories of 10 to 26 images

Pierre-Anthony Lemieux

	Pierre-Anthony Lemieux <@179701849576833024> Looks like encode times are 2x faster
2022-06-01 02:47:19	but is not lossless on `qoi_benchmark_suite/images/textures_pk/{blaztree2.png`

_wb_

2022-06-01 03:04:29	really? interesting that there's a difference between the scalar and simd code paths...
2022-06-01 03:07:46	I cannot reproduce, it seems to be lossless when I try it
2022-06-01 03:08:31	could you share the jxl file you get?

Pierre-Anthony Lemieux

2022-06-01 04:01:33

_wb_

2022-06-01 04:12:07	that looks completely wrong
2022-06-01 04:12:59	this is the jxl I get for that image
2022-06-01 04:14:16	i wonder what's happening there.... if you compile it without those compiler options, does it work then?

Pierre-Anthony Lemieux

2022-06-01 04:17:44	> if you compile it without those compiler options, does it work then? Yes.
2022-06-01 04:23:36	The only difference is whether lines 54-55 at https://github.com/sandflow/libench/pull/12/files are commented-out.
2022-06-01 04:27:14	File without optimization

_wb_

2022-06-01 05:01:44	Still 99kb?
2022-06-01 05:05:18	I wonder what is happening that causes you to get a file twice as large as the one I get

monad

	_wb_
2022-06-01 06:16:16	This parses a lot better. 👍

_wb_

2022-06-01 06:38:17

I am quite happy with both the results themselves and the presentation as it is now

Pierre-Anthony Lemieux

	_wb_ Still 99kb?
2022-06-01 07:44:27	Are you using the fast lossless code?

_wb_

2022-06-01 07:45:15

Yes

veluca

2022-06-01 07:46:55	huh...
2022-06-01 07:47:03	what's your CPU model?

_wb_

2022-06-01 07:48:13	he's somehow getting a 97kb file even in a scalar build, while I get a 40kb file
2022-06-01 07:48:26	that's kind of weird

yurume

2022-06-01 07:49:36

are both valid?

veluca

2022-06-01 07:51:04	this smells like UB
2022-06-01 07:51:18	have you tried building libench?

_wb_

2022-06-01 07:51:19	it's an image that can use palette
2022-06-01 07:51:34	maybe my palette detection code is doing something fishy

veluca

2022-06-01 07:52:21

try msan-ing it 😛

_wb_

2022-06-01 07:54:06

<@553324745240608773> do you also get the 97kb file when using a standalone build of fast_lossless? (i.e. what you get when running `build.sh`)

Pierre-Anthony Lemieux

	_wb_ <@553324745240608773> do you also get the 97kb file when using a standalone build of fast_lossless? (i.e. what you get when running `build.sh`)
2022-06-01 07:54:41	I have not tried
	veluca this smells like UB
2022-06-01 07:55:25	UB?

_wb_

2022-06-01 07:55:34	maybe there is a bug in the nb_chans != 4 case or something
2022-06-01 07:55:52	the fast_lossless_main only calls it with RGBA

yurume

	Pierre-Anthony Lemieux UB?
2022-06-01 07:56:14	undefined behavior; C/C++ is particularly pedantic about that
2022-06-01 07:56:56	not technically a miscompilation but in terms of the programmer's intention it would look like a miscompilation

_wb_

2022-06-01 08:08:43	<@553324745240608773> are you passing it an RGB buffer (3 channels)? I think that might explain the difference
2022-06-01 08:09:25	palette detection is currently just disabled unless you pass an RGBA buffer, I guess I never bothered to implement it for other pixel formats
2022-06-01 08:10:03	and possibly there might be bugs when the simd code gets an RGB instead of RGBA buffer
2022-06-01 08:10:41	so I guess for best results you should always pass it RGBA at the moment, even if the image is actually only RGB

Pierre-Anthony Lemieux

2022-06-01 11:06:03	Updating the compiler to gcc 9.4 helped JXL significantly: http://www.lossless-benchmarks.com/
2022-06-02 01:58:36	Oops. It looks like somehow updating the compiler enabled multi-threading. Will need to regenerate results.

_wb_

2022-06-02 08:03:32

monad

2022-06-02 08:34:31

This is a nice graph.

_wb_

2022-06-02 08:12:08	https://twitter.com/jonsneyers/status/1532442496038584328?t=CD7A6WS5sQyz7wPwAT3v_A&s=19
2022-06-02 08:13:45	https://twitter.com/jonsneyers/status/1532444352097800192?t=FoMi8XSpDl0CELeDUuCAkw&s=19
2022-06-02 08:14:21	Turns out psnr really sucks to evaluate modern codecs, who would have thought...

Fox Wizard

2022-06-02 08:18:52

Guess that was to be expected <:KekDog:884736660376535040>

_wb_

2022-06-02 08:19:14	Also my gut feeling that ssimulacra is not to be trusted on newer codecs turns out to be right... Too bad but it is what it is
2022-06-07 10:47:09	<@710762823986446367> are you still around here? I obtained the device_pixel_ratio values that the experiment participants had, this is the top: #participants DPR 8263 1.000 1621 1.250 473 1.500 213 2.000 73 0.900 55 1.100 52 2.750 49 3.000
2022-06-07 10:49:00	going by the screen width/height and UA hints, probably most of the dpr 2 ones are macbook pros but some are phones, and all of the dpr > 2 ones are phones — participants were instructed to not use a phone, but apparently some still did
2022-06-07 10:50:47	many of the dpr>2 participants were already filtered in the results though, because their answers weren't serious (they claimed the hidden reference was very low quality, or left the answer value at the default 'medium' value for most images, and stuff like that), which kind of makes sense since they're already ignoring instructions in the first place
2022-06-07 10:51:19	I assume dpr 1.25 and 0.9 are chrome zoom settings? maybe 1.5 too?
2022-06-07 10:52:15	yes, 1.10, 1.25, 1.5 are the first three zoom-in settings, apparently quite a few participants did do that
2022-06-07 10:52:26	73 participants zoomed out one step to 0.9, that's also interesting
2022-06-07 10:53:01	I wonder if they knew that they were doing that or if it was by accident; I assume most of the 1.25/1.5 zoomers did it intentionally though

Jake Archibald

2022-06-07 11:41:17	In April 2021, gov.uk saw 32% of non-mobile users with a DPR of 1.5 or above. Although, yes, that could be folks zooming the viewport, so the pixels might still be big. But in terms of overall users (when you include phone and tablet) it's 80% of users on high DPI devices.
2022-06-07 11:42:19	One rough edge in the data is how macOS (and maybe other operating systems) handle DPR of between 1 & 2. It renders at 2, and the browser will report 2, then it'll scale down to the actual device resolution.

_wb_

2022-06-07 11:55:15	Looks like we had a smaller percentage than that with a DPR >= 1.5, which I guess could be due to different demographics (I suspect we had many participants in eastern europe / russia, e.g. I see lots of UA strings containing YaBrowser; also possibly gov.uk gets more traffic from people who are using zooming for accessibility reasons while I assume/hope most of the testers have good vision).
2022-06-07 12:01:31	I think we need to do a follow-up experiment on phone/tablet (or maybe just phone) to see what happens there. I agree it's too important to ignore them, many use cases are essentially even mobile-only nowadays, which means dpr >= 2
2022-06-07 01:54:12	comparing MOS scores computed using just the dpr 1 (and dpr<1) participants to the MOS scores using just the dpr > 1 participants, the difference is not super large: on average, the dpr>1 people give scores that are 0.173 lower than the dpr 1 people (where the scores are on a scale from 0 to 10)
2022-06-07 01:55:37	it's not surprising that they give lower scores, since these are mostly people with dpr 1 screens who are zooming in, so they'll more easily see artifacts
2022-06-07 02:43:36	I guess grouping dpr>1 together is mixing the signals a bit too much, since it contains both people who zoom 150% on a dpr1 screen and people who are using a real high dpr device. Comparing just dpr1 to dpr 1.25 (which I think are both dpr 1 but the dpr1.25 people are zooming 125%), the MOS is 0.44 lower on average at 125% zoom than at 100% zoom

paperboyo

2022-06-07 02:48:00

> participants were instructed to not use a phone Interesting. Since, at least ours, traffic is mainly mobile, wouldn’t that skew results towards (again?) more high-quality corner?

_wb_

2022-06-07 03:55:00	well it was a limitation of the crowd-sourcing platform we used (subjectify.us); we're now consulting with them to see if they can also do experiments on phone, might take a while until we can do something though
2022-06-07 03:57:13	I think in general, on phones the viewing distance in pixels is larger than on laptop/desktop (phone is closer to eyes but it's also typically a very dense screen), so you can get away with lower quality
2022-06-07 03:58:59	however as phones are getting sillier and sillier densities, we observe that people stop trying to send images at native resolutions: sending an image at dpr 4 is basically a waste of bandwidth because (unless zooming is done) it will be impossible to distinguish from dpr ~2.5
2022-06-07 03:59:26	so in practice, many images on mobile are now actually browser upscaled, I recently learned...
2022-06-07 04:18:21
2022-06-07 04:18:36	it’s quite remarkable how all the metrics are a lot better at predicting JPEG quality than they are at predicting j2k, avif or webp quality
2022-06-07 04:50:50	PSNR having a slightly negative correlation with low quality WebP DMOS is funny

Cool Doggo

2022-06-07 06:45:37

interesting to see dssim score so well since i dont think ive ever seen anyone mention it before 🤔

_wb_

2022-06-07 07:05:32	It's <@826537092669767691> 's metric
2022-06-07 07:06:01	And yes, it is doing very well
2022-06-07 07:08:09	I think I have probably been overfitting ssimulacra to the data I had back then, which was pairwise comparisons where I had lots of A vs B comparisons, but all comparisons between distorted versions of the same original. That meant that to predict those comparisons well, it didn't have to know absolute qualities (like MOS scores are)
2022-06-07 07:09:54	While vmaf was trained on mos scores, and butteraugli too was created by finding the jnd threshold, which is also an absolute quality notion
2022-06-07 07:10:54	Dssim I assume was also tuned for existing MOS datasets like TID2013? <@826537092669767691> will know how he tuned it
2022-06-07 07:12:08	<@532010383041363969> if I would try to tweak butteraugli based on the data I have, what's a good constant to play with?

paperboyo

	_wb_ however as phones are getting sillier and sillier densities, we observe that people stop trying to send images at native resolutions: sending an image at dpr 4 is basically a waste of bandwidth because (unless zooming is done) it will be impossible to distinguish from dpr ~2.5
2022-06-07 07:44:57	We don’t even bother to satisfy >2, based on me looking at a phone. And on https://observablehq.com/@eeeps/visual-acuity-and-device-pixel-ratio.

Kornel

	_wb_ Dssim I assume was also tuned for existing MOS datasets like TID2013? <@826537092669767691> will know how he tuned it
2022-06-07 09:03:34	Yes, DSSIM is validated using tid dataset
2022-06-07 09:04:37	But most of its power is from using multi-scale based on IWSSIM paper
2022-06-07 09:05:32	Btw it'd be nice to see how Nvidia FLIP fares
2022-06-07 09:05:52	There's a C++ implementation so it should be relatively easy to check

_wb_

2022-06-07 09:25:40	Yes, will check it
2022-06-08 07:18:08	computing FLIP and BA 2norm scores now to get the Kendall correlation for those two too, takes a while to compute all scores (just doing it on my laptop)
2022-06-08 09:33:42	all, PSNR, 0.3447, 19869 all, BA-3norm, 0.6540, 19869 all, BA-2norm, 0.6552, 19869 all, SSIMULACRA, 0.5124, 19869 all, DSSIM, 0.6245, 19869 all, VMAF, 0.5978, 19869 all, FLIP, 0.4369, 19869 mozjpeg, PSNR, 0.4846, 2735 mozjpeg, BA-3norm, 0.7340, 2735 mozjpeg, BA-2norm, 0.7399, 2735 mozjpeg, SSIMULACRA, 0.6045, 2735 mozjpeg, DSSIM, 0.7359, 2735 mozjpeg, VMAF, 0.7450, 2735 mozjpeg, FLIP, 0.6190, 2735 cld_webp, PSNR, 0.2934, 2245 cld_webp, BA-3norm, 0.6683, 2245 cld_webp, BA-2norm, 0.6700, 2245 cld_webp, SSIMULACRA, 0.5270, 2245 cld_webp, DSSIM, 0.6342, 2245 cld_webp, VMAF, 0.6455, 2245 cld_webp, FLIP, 0.4110, 2245 cld_jp2, PSNR, 0.2788, 2243 cld_jp2, BA-3norm, 0.6164, 2243 cld_jp2, BA-2norm, 0.6277, 2243 cld_jp2, SSIMULACRA, 0.4526, 2243 cld_jp2, DSSIM, 0.6211, 2243 cld_jp2, VMAF, 0.5864, 2243 cld_jp2, FLIP, 0.4331, 2243 libjxl, PSNR, 0.4030, 2719 libjxl, BA-3norm, 0.7471, 2719 libjxl, BA-2norm, 0.7271, 2719 libjxl, SSIMULACRA, 0.5549, 2719 libjxl, DSSIM, 0.6780, 2719 libjxl, VMAF, 0.6860, 2719 libjxl, FLIP, 0.5165, 2719 AVIF, PSNR, 0.2598, 7933 AVIF, BA-3norm, 0.6364, 7933 AVIF, BA-2norm, 0.6326, 7933 AVIF, SSIMULACRA, 0.4452, 7933 AVIF, DSSIM, 0.5803, 7933 AVIF, VMAF, 0.6164, 7933 AVIF, FLIP, 0.3836, 7933 cld_heic, PSNR, 0.3535, 1994 cld_heic, BA-3norm, 0.6803, 1994 cld_heic, BA-2norm, 0.6851, 1994 cld_heic, SSIMULACRA, 0.5528, 1994 cld_heic, DSSIM, 0.6431, 1994 cld_heic, VMAF, 0.6517, 1994 cld_heic, FLIP, 0.4492, 1994
2022-06-08 09:34:18	that's image corpus, metric, Kendell correlation, number of images
2022-06-08 09:35:12	so FLIP is not so great, Butteraugli 2-norm is slightly better than 3-norm (but not for estimating jxl and avif)
2022-06-08 11:07:04	recomputed DMOS scores to discard mobile user opinions (so it's more uniform desktop/laptop-only data) and to compute MOS in a statistically somewhat more robust way (taking mean after removing outliers if there are any of those)
2022-06-08 11:08:47	This is the ranking I get so far: BA-2norm, 0.6535 BA-3norm, 0.6511 DSSIM, 0.6248 VMAF, 0.5968 SSIMULACRA, 0.5132 FLIP, 0.4357 SSIM, 0.4186 PSNR, 0.3425
2022-06-11 08:41:14
2022-06-11 08:42:07	quite interesting to see that all metrics are better at predicting DMOS for JPEG images than the are at predicting WebP or AVIF.

Orum

2022-06-11 10:49:18

most people haven't trained themselves yet to recognize AVIF's artifacts <:YEP:808828808127971399>

_wb_

2022-06-13 01:33:20
2022-06-13 01:33:33	I computed Kendall correlation for the simplified ssimulacra (`ssimulacra_main -s` in the libjxl repo) which is basically just SSIM in L\a\b* with some extra weight for the worst error (so it doesn’t have the blockiness and ringing penalties), and looks like the simpler ssimulacra is also the better one...

Cool Doggo

	_wb_
2022-06-14 03:22:23	what does the MH and LM mean on the bottom two?

_wb_

2022-06-14 05:02:49	Medium-High quality only (DMOS > 70 iirc) and Low-Medium quality only (DMOS < 75 iirc)
2022-06-15 03:39:08	someone on twitter wanted to get more detail on non-photo
2022-06-15 03:41:12	it's the one category where jxl is not doing so well atm — I think the bitstream has plenty of tools to do something better for these images, but it's a matter of detecting when to not use dct...
2022-06-15 03:42:54	avif and webp have the advantage of directional prediction which really helps for these images... plus in the case of avif, palette blocks - which are less expressive than what jxl can do, but that also makes it easier to make an encoder that just tries them

Traneptora

	_wb_ avif and webp have the advantage of directional prediction which really helps for these images... plus in the case of avif, palette blocks - which are less expressive than what jxl can do, but that also makes it easier to make an encoder that just tries them
2022-06-15 04:52:46	would it be possible to make the encoder "just try" a less expressive equivalent subset

_wb_

2022-06-15 04:56:12

maybe, but it's a bit tricky - palette blocks are just another block type in av1 and they are kind of independent from one another regarding entropy coding.... in jxl you'd use patches, but they're not just a block type and they are not independent from one another regarding entropy coding

monad

2022-06-16 05:42:30

Seems we still have a photo in non-photo.

_wb_

2022-06-16 06:04:05

It's kind of mixed I'd say, if you mean the photo of the hand holding a paper with text

Jyrki Alakuijala

	_wb_ <@532010383041363969> if I would try to tweak butteraugli based on the data I have, what's a good constant to play with?
2022-06-17 03:10:41	all parameters :-), lines 310-314, 442-446, 447-448, 946-947, 1125-1127, 1159-1161, 1190-1192, 1199-1201, parameters starting with w in 1691 - 1727

_wb_

2022-06-17 04:21:45	tweaking butteraugli will be for later, but this is where I'm getting so far with a tweaked ssimulacra:
2022-06-17 05:38:50	It's interesting how some codecs seem to be 'easier' for perceptual metrics than others. All metrics are doing better at estimating the quality of a jpeg image than the quality of a j2k or avif image.
2022-06-17 05:41:04	Could be that they were mostly tuned based on jpeg data, but then again, I don't think it's fully that because it still happens when I tune ssimulacra using data from all codecs, and j2k has been around long enough for the usual metric training datasets like TID2013 to contain examples of it...
2022-06-17 05:45:29	In any case I am happy to be able to demonstrate quite clearly how crappy psnr and ssim are compared to the newer metrics. It's a really big gap in Pearson correlation.

Jyrki Alakuijala

2022-06-17 06:13:04

simplex search (Nelder-Mead inspired) is a powerful method ❤️

Cool Doggo

	_wb_ Could be that they were mostly tuned based on jpeg data, but then again, I don't think it's fully that because it still happens when I tune ssimulacra using data from all codecs, and j2k has been around long enough for the usual metric training datasets like TID2013 to contain examples of it...
2022-06-17 09:18:59	could also possibly be that people are just "better" at rating jpeg quality than other metrics

_wb_

2022-06-17 10:19:39	yes, or at least have more consistent opinions about them, so the ground truth data is less noisy or something...
2022-06-18 11:22:37	after one more day of simplex

yurume

2022-06-18 11:32:09

isn't there a possibility of overfitting?

_wb_

2022-06-18 11:34:55	I optimize using data from 200 images and check if it also performs well on the 50 images it never saw
2022-06-18 11:40:03	Also it's 'only' 146 parameters it is optimizing for, while there are close to 20k DMOS scores to predict
2022-06-18 11:41:28	But yes, there is likely a bit of overfitting, and when I am done it would be nice to check what happens on completely different data

yurume

2022-06-18 11:43:59

ah, that sounds okay. I thought it is being tested against all the images.

_wb_

2022-06-18 11:47:43

Well the numbers above are for the whole set (training+validation sets), but the numbers I get for validation set only are similar (even slightly better)

Traneptora

2022-06-18 02:34:28

what are those charts documenting?

_wb_

2022-06-18 03:12:54	How good each perceptual metric is, basically
2022-06-18 03:13:27	How well they correlate with actual subjective quality assessment

Traneptora

2022-06-18 03:37:32	I see
2022-06-18 03:38:03	so what I'm gathering in a nutshell, is the norms that consider colorspaces tend to more accurately reflect what humans see
2022-06-18 03:38:11	that is stuff like SSIMULCRA and Butteraugli
2022-06-18 03:38:27	and also outperform VMAF

_wb_

2022-06-18 05:06:48	Well the default sssimulacra is quite poor (not shown here), but the tweaked ones are good
2022-06-25 09:19:30
2022-06-25 09:20:32
2022-06-25 09:22:27	xyb-mssim is the variant of ssimulacra I'm tweaking using the data, so it's cheating a bit (but I think it generalizes, results are the same for the 20% I'm not using as for the 80% I'm using)
2022-06-25 09:24:33	the ideal metric is a thin diagonal line here, i.e. one that perfectly predicts subjective opinions. Probably that is impossible though, considering how noisy human opinions are...

BlueSwordM

	_wb_ xyb-mssim is the variant of ssimulacra I'm tweaking using the data, so it's cheating a bit (but I think it generalizes, results are the same for the 20% I'm not using as for the 80% I'm using)
2022-06-29 03:17:59	Ooh, this seems nice.
2022-06-29 03:18:07	Question about it: how good is it at penalizing edge artifacts?

_wb_

2022-06-29 03:31:37	you mean artifacts near edges, or artifacts that introduce spurious edges (like banding or blocking)?
2022-06-29 03:32:37	at the moment xyb-msssim is not even asymmetric, i.e. score(orig,distorted) == score(distorted,orig)
2022-06-29 03:36:46	it's just computing the ssim map for 3 components (X,Y,B) at 6 scales (1:1, 1:2, 1:4, 1:8, 1:16, 1:32) and taking 4 different norms: L1 (avg), L2 (root mean square), L4 and L8, and then each of these have a multiplier and an exponent, and that's the metric
2022-06-29 03:38:52	3 components x 6 scales x 4 norms x 2 weights (multiplier and exponent) = 144 parameters to tune (+ 2 for a global multiplier and exponent after summing them all up)

BlueSwordM

	_wb_ you mean artifacts near edges, or artifacts that introduce spurious edges (like banding or blocking)?
2022-06-29 05:07:38	Artifacts near edges like ringing and basis noise. Basically, I want a metric that penalizes edge artifacts, but while taking into account high frequency detail around that edge, as to make an edge cleaning/restoration algorithm more conservative than what PSNR(or worse, plain MSE) would tell it to do, just blurring stuff.

_wb_

2022-06-29 06:58:26	MSE and PSNR are the same thing, no?
2022-06-29 06:59:15	Both are the L2 norm of the error, basically
2022-06-29 07:00:36	PSNR just takes a log, RMSE takes a square root (this is the actual L2), and MSE doesn't do anything, but they're basically the same thing modulo such 'gamma adjustment'

BlueSwordM

	_wb_ PSNR just takes a log, RMSE takes a square root (this is the actual L2), and MSE doesn't do anything, but they're basically the same thing modulo such 'gamma adjustment'
2022-06-30 12:25:28	They aren't exactly the same thing. They are close enough, but using PSNR is better overall as MSE calculates the mean square error, while PSNR is what you described(max value squared to the MSE value IIRC).

_wb_

2022-06-30 05:31:39

I mean, optimizing for lowest MSE is the same thing as optimizing for highest PSNR

BlueSwordM

2022-06-30 02:21:01

In the end, yeah :p That is why I want a better metric for such a thing.

_wb_

2022-06-30 05:06:56	another way to look at encoder consistency
2022-06-30 05:08:21	the thick middle line is the median, between the thinner solid lines are 80% of the images (they are p10 and p90), between the dashed lines are 95% of the images (p2.5 and p97.5)
2022-06-30 05:16:15	DPOS is like DMOS but instead of taking the mean of the opinions, you take the "pessimist opinion", i.e. instead of taking the average of all ratings, you look at p20 to p50 of the opinions — so if everyone says it's a 7/10, the score will be 7/10, but if half of the people say it's a 5/10 and half say it's a 9/10, the average is also 7/10 but the pessimist opinion would be 5/10.
2022-06-30 05:17:48	this is the same plot but with DMOS instead of DPOS — obviously in general the scores are higher when you look at mean opinion instead of pessimist opinion

BlueSwordM

	_wb_ this is the same plot but with DMOS instead of DPOS — obviously in general the scores are higher when you look at mean opinion instead of pessimist opinion
2022-07-06 10:36:59	Now the question is: where I can find your version of XYB-MSSIM?

Traneptora

	BlueSwordM Now the question is: where I can find your version of XYB-MSSIM?
2022-07-07 12:59:27	https://github.com/libjxl/libjxl/pull/1535

BlueSwordM

	Traneptora https://github.com/libjxl/libjxl/pull/1535
2022-07-07 01:23:35	Oh very nice. I wonder how fast is it. Because if it's anywhere near as fast as normal MS-SSIM, it could be insanely useful for what I want to use it for.

_wb_

2022-07-07 06:18:03	I'm still refining it, the pull request as it is now is not going to be the final version, but it should already be somewhat useful
2022-07-07 06:19:17	And it should be pretty fast, it's only doing conversion to XYB and then basically normal MS-SSIM (only with more norms, but that shouldn't be much of a slowdown)

Razor54672

2022-07-14 02:11:46

Is there benchmarking for 2D illustrations? There are many on Pixiv that are 10-20 mb PNGs and compression seems to work quite well. AI Upscalers and the likes usually have separate modes depending on content type. Is it possible for JXL to achieve better compression / perceptual quality if its encoding model is content-dependent or is the generalized implementation approach better?

Traneptora

2022-07-14 03:30:12	AI upscalers with separate modes are doing so based on different sets of training data
2022-07-14 03:30:22	whereas JXL is not encoded like a neural net

_wb_

2022-07-14 04:02:30	Lossy illustrations is not something libjxl is using its full potential for yet, imo. In principle splines and patches could help a lot for those, but it will take nontrivial encoder improvements to actually use those effectively. At the moment we mostly only use DCT, which thanks to variable blocks is not bad (it's a lot better than legacy jpeg) but it can't compete yet with avif.
2022-07-14 04:03:40	Then again for many use cases of illustrations, lossless is preferred, and there jxl is quite good already.
2022-07-14 04:06:36	I specifically mean illustrations with clean, hard lines separating smooth or solid surfaces. This is something where avif's artifacts (which are mostly smearing and smoothing) are visually quite acceptable, while dct artifacts (like ringing) are visually annoying.
2022-07-14 04:10:38	For illustrations using more 'natural' styles, i.e. not drawn in software with hard clean lines but painted e.g. using brushes, pencils, chalk or charcoal, dct is more suitable again, since there is more 'texture' that will visually align better with dct artifacts.

JendaLinda

2022-07-14 04:36:42

I guess DCT is also good for scanned artworks. Are those advanced techniques planned for the 1.0 release or are they going to be added later?

_wb_

2022-07-14 04:47:52	I already made a first attempt at using patches more effectively for non-photographic images: https://github.com/libjxl/libjxl/pull/1395 But we have to be careful with such changes, make sure we don't introduce unintended regressions or edge cases where it causes trouble, etc.
2022-07-14 04:48:54	The jxl bitstream is very expressive and there is huge potential for future encoder improvements, way more than e.g. mozjpeg vs libjpeg.
2022-07-14 04:51:26	But working on better encoders is always more attractive when the target format is widely supported than when it's still early days.

JendaLinda

2022-07-14 05:15:45

So the encoder would have to determine when its worth it to use patches.

_wb_

2022-07-14 05:23:52	Yes, which is quite hard since there is no way to know in advance how many bits a compressed patch will need - patches are stored together in a patch frame (a kind of sprite sheet) and the cost of a single patch also depends on the contents of the other patches.
2022-07-14 05:24:47	So it's mostly a matter of coming up with good heuristics, doing things exhaustively is not feasible.

JendaLinda

2022-07-14 05:27:24

In any case, using patches in lossy must be done very carefully so we won't get another JBIG2.

_wb_

2022-07-14 05:54:14	Yes, though the risk for that is somewhat mitigated in the current encoder by using kAdd patches, not kReplace patches — i.e. residual errors still get encoded too.
2022-07-14 05:55:38	If we would do lossy kReplace patches, it would definitely have to be based on a max error norm, not an avg error norm which leads to JBIG2 style problems.

JendaLinda

2022-07-14 06:06:19

I see, kReplace would be useful in lossless, I suppose.

Hello71

2022-07-15 08:15:58

is there a reasonably easy way to manually specify patch areas?

_wb_

2022-07-15 08:17:04	not atm. Maybe we could add syntax for that to jxl_from_tree...
2022-07-21 04:24:35	We finalized the data cleanup and analysis of the subjective experiments we performed between November 2021 and June 2022, collecting >1.5m human opinions in total (before filtering). I just posted a thread with some key results on twitter: https://twitter.com/jonsneyers/status/1550145976123392000
2022-07-21 04:32:08	I think our results are pretty strong evidence in favor of JPEG XL. Also I think they help explain the lukewarm reception JPEG 2000 got (and lossy WebP too, to some extent): you can see quite clearly that the real gains are rather 'meh' compared to the claims that were made, in some cases they don't even outperform (moz)jpeg, and those small compression gains have to be traded for significantly worse consistency (and obviously worse interoperability, but that's always an issue with new codecs), making deployment far from a no-brainer.
2022-07-21 06:24:55	Added more stuff to the thread, maybe I got carried away a bit: https://twitter.com/jonsneyers/status/1550182620079816707
2022-07-21 06:25:48	can someone check my somewhat mindblowing claim that jxl adoption is equivalent to a CO2 reduction of 32 million cars?

yurume

2022-07-21 06:29:28

that assumes jpeg and jxl decoders are about comparable in power consumption, right?

_wb_

2022-07-21 06:32:34	yes, but that doesn't matter hugely, since decode (and encode!) cost is only a fraction of transfer cost
2022-07-21 06:33:46	(and server/cdn storage cost, which also comes at a significant energy cost)
2022-07-22 08:29:32
2022-07-25 08:12:28
2022-07-25 08:12:51	SSIMULACRA 2 available here: https://github.com/libjxl/libjxl/pull/1646
2022-07-25 08:41:32
2022-07-25 08:42:29	The black lines basically indicate how useful a metric is to predict visual quality
2022-07-25 08:43:37	The dotted lines contain 90% of the cases
2022-07-29 02:03:08	most recent version of our results
2022-07-29 02:05:01	per category (split up non-photo in two categories and materials+clouds too)
2022-08-12 03:16:44	we decided to use MCOS and not DMCOS, so not scaled/normalized so 100 is "as good as the reference image" but just using the opinions as is, and just indicating the range of the score for the reference image (which is basically where things get visually lossless)
2022-08-12 03:18:22
2022-08-12 03:19:40	(dark shaded region is a 90% confidence interval around the mean of the Reference image score, lighter shaded region is the region between p10 and p90 of Reference image scores for that subcorpus)
2022-08-12 03:31:10	the most relevant MCOS range is 60 to 90 (medium-high to very high quality). MCOS scores of 50 or lower are "embarrassingly low quality" in my opinion and I don't think there are really use cases for such low quality — maybe it could be useful on a 2G connection or in similar extremely low bandwidth conditions, but for 'normal' web delivery I think MCOS ~80 is typically appropriate (for a 512x512 image, that is the average quality of a mozjpeg ~q83; for larger image dimensions the q setting is likely lower)
2022-08-12 03:48:07	AVIF vs JXL per category is quite interesting: - AVIF wins clearly: diagram/chart, illustration/logo/tex - about the same, slight advantage for AVIF: indoors/rooms, sports - about the same: art/abstract/decoration - about the same, slight advantage for JXL: food/drinks, people/fashion, urban/industrial/cars - JXL wins: animals, building/monument, night/nightlife, portrait - JXL wins clearly: landscape/nature, materials/clothes, sky/clouds

improver

2022-08-12 04:34:09

avif confirmed only being good with fake & plastic stuff, use jxl to preserve the nature

JendaLinda

2022-08-12 04:36:52

"AVIF wins clearly: diagram/chart, illustration/logo/tex" Are you sure? AVIF lossless compression seems to be worse than PNG.

diskorduser

2022-08-12 04:52:45

avif looks good on images with limited colors like logos and charts. I mean lossy avif.

_wb_

2022-08-12 05:05:34	Lossy avif wins on nonphoto, atm at least. For lossless, jxl wins in all categories.
2022-08-12 05:07:30	The old jpeg and png may have conditioned us to think that lossless and nonphoto belong together (since that's what you would use png for) and lossy and photo belong together (since that's what you would use jpeg for), but it does not need to be like that
2022-08-12 05:12:14	Lossless vs lossy is more about workflow / delivery chain than it is about image content: for authoring (both photo and nonphoto) lossless is desirable for two main reasons: 1) avoiding generation loss, 2) allowing edits that may reveal things that would not be visible before the edit, like making darks brighter - the whole point of lossy compression is to delete image info that is not visible.
2022-08-12 05:12:56	And for delivery, lossy is of course desirable to the extent that it saves bandwidth.
2022-08-12 05:15:42	Now it is true that some kinds of nonphoto images can actually compress better using lossless (or almost lossless) than using lossy. For such images, there is of course no reason to go lossy.

JendaLinda

2022-08-12 05:22:18

True, it's more a habit. I've seen lots of charts and graphics in crusty jpegs, so I'm not a fan of using lossy compression for nonphoto content.

_wb_

2022-08-12 05:26:37	yeah - the generation loss from repeated avif encoding likely looks quite different/nicer though, and probably the main problem with jpeg charts is automated transcoding services that use low q settings that are just too low for that image content even in a single generation
2022-08-12 05:31:11	e.g. mozjpeg q70 reaches a MCOS of ~75 on average for portrait images (human faces), but for diagrams/charts, mozjpeg q70 only reaches a MCOS of ~62. If the automated transcoder (say twitter or facebook) selected the setting q70 by looking at selfies, but they use it for all images, then effectively they're using lower quality for photos than for non-photos

JendaLinda

2022-08-12 05:47:58

In any case, nonphoto content needs higher quality settings so it won't look messy or blurry.

_wb_

2022-08-12 06:04:02
2022-08-12 06:14:12	so even looking only at photographic images: if you want to reliably reach at least MCOS 60 (medium-high quality), then with jxl you have to aim at MCOS 66 (if you're OK with 1 in 10 images being below MCOS 60) or at MCOS 70 (if you're only OK with 1 in 100 images being below MCOS 60). However, with mozjpeg, webp and avif, you would have to aim at MCOS 69 or MCOS 75 to reach 90% or 99% quality reliability...
2022-08-12 06:31:04

JendaLinda

2022-08-12 06:40:32

Photos are pretty forgiving for lossy compression as photos are never perfect.

_wb_

2022-08-12 06:45:10

Especially nature photography (including human faces), which does not really have extremely hard edges surrounded by clean areas that will make any ringing artifacts obvious

JendaLinda

2022-08-12 06:46:23

Low fidelity settings may cause unnatural smoothing though.

_wb_

2022-08-12 06:48:04	Yes, that's what ruins avif for things like clouds and other subtle textures.
2022-08-12 06:50:33	Aggressive smoothing, smudging and deringing is good for non-photo and photos of things like modern buildings or cars, but it also is a problem for the fidelity of natural photos...

JendaLinda

2022-08-12 06:55:22

One of the infamous cases is whenever the video codec used by TV broadcaster turns grass on a soccer playfield into solid green color.

_wb_

2022-08-12 07:36:39

Do you have a link talking about that?

JendaLinda

2022-08-12 08:09:35

Not really, this was discussed on various forums years ago. To be fair, the situation has improved since that.

_wb_

2022-08-12 08:24:25	Anyway, even for video, I wonder how long low-fidelity lossy will remain a thing. In the past two decades, better codecs and faster network have been mostly balanced by increasing pixel dimensions (from smaller than SD to full HD to 8K). Now we are hitting the limits of meaningful extra resolution (8K is already well above 'retina' resolution if you can see the whole frame), so I imagine any improvements in codec tech and network/storage capacity will possibly go to higher fidelity now.
2022-08-12 08:28:11	At some point, I wonder if it will still be acceptable to do inter-frame stuff, and people might start demanding intra-only encoding with strong per-frame fidelity guarantees, just like digital cinema has been doing all along (they use intra-only 10-bit jpeg 2000 in XYZ colorspace, iirc)
2022-08-12 08:42:17	I always used to assume that still codecs and video codecs would at some point converge and it would be a video codec (say av2 or av3) that just gets used for both still and video. Now I am not so sure anymore that that will be the way it will go. It could also be that inter frame coding tools become increasingly irrelevant as fidelity and latency expectations grow, and convergence happens the other way around: everything gets done using a still image codec at a single fidelity target: visually lossless.
2022-08-12 08:43:44	(or even fully lossless, the way it is going with audio)

Hello71

2022-08-13 12:12:34

I think that will probably take a really long time, if ever, due to physics limitations on the maximum performance of storage devices and wireless network links. Hard drives remain the best price/capacity storage devices for home users, and those have only improved by something like... 2x in the past decade? Satellite internet probably still has room to grow, but AFAIK it simply isn't possible to squeeze significantly more aggregate bandwidth out of e.g. Starlink, barring some massive physics breakthrough.

2022-08-13 12:19:39

I think the only place you might possibly see this shift is in some kind of "videophile movement" storing media on NAS devices, but even then, I think we're still really far away from your endpoint. Look at music: codecs have massively improved in the past three decades, and a single song is basically "too small to meter", and yet most music is still sold and stored in lossy formats, because most consumers would rather store 10000 songs at decent quality than 2000 songs at "perfect" quality. I would guess that most "videophiles" would probably rather store 2000 movies at decent quality than 400 movies at "perfect" quality.

_wb_

2022-08-13 12:47:02

Sure, I don't mean in the near future. More like in 2-3 decades. But I do think "cinema quality" movies will at some point be desired. That's not lossless btw (not like audiophiles using FLAC), but just very high quality lossy and with guaranteed no artifacts from inter, which does make a big difference in fast-moving action scenes and also when slow camera movements are used (like panoramic panning shots), where inter can cause annoyingly variable quality and visible delays in getting the full detail.

Hello71

2022-08-13 12:52:24

Why don't you think that inter-frame codecs will be able to improve over 2-3 decades to the point that inter-frame artifacts are negligible? Certainly, inter-frame video compression is much much better today than it was 2-3 decades ago.

_wb_

2022-08-13 08:06:15	Yes, I guess better encoders could avoid the worst artifacts, and for some use cases it will remain the best idea. But as the fidelity target goes up, the usefulness of inter goes down, just like with directional prediction in intra: it's only useful if most residuals can be quantized away, otherwise you only end up introducing more entropy.
2022-08-13 08:37:59	Anyway, right now people are even still using GIF so fidelity expectations are pretty low and inter frame is going to remain useful for quite a while. I am talking far future here, when bandwidth is commonly available for intra-only very high fidelity 8K streaming, and at that point people might want to get "cinema quality" options in the Netflixes of that era. Maybe some inter tools will still be effective for that, of course. Things always depend as much on how encoders make use of coding tools as on what coding tools are available.
2022-08-13 09:00:12	At some point the resolution, fps and dynamic range are just high enough so any further improvements are irrelevant for human perception. 8K at 60fps in 12-bit HDR is probably roughly the limit - anything better than that is just not going to be perceivable. And then the only axis where improvements can still be made is that of the lossy compression, where we are currently used to relatively low fidelity since broadcasters and streaming services need to keep bandwidth reasonable while the uncompressed data volume has been growing incredibly: two decades ago, 720x480 at 30fps (DVD) in tv-range 4:2:0 was pretty much the best you would get (which is 15.5 MB/s uncompressed), while now we can easily imagine 8K at 60fps in 12-bit 4:4:4 (which is 8.95 GB/s uncompressed).

Hello71

2022-08-14 01:58:07	Yeah, now that I think about it more, the FLAC/audio argument does favor the theory of intra-only possibly being used for the "video fanatic" segment.
2022-08-14 02:02:21	On the other hand, perhaps 8K won't be enough if VR really takes off. Maybe we'll see some kind of focus-sensitive perceptual codec where it encodes what you're looking at in high resolution? I guess that really does have the physics barrier that saccade speed across typical FOV exceeds light speed across earth distances...

2022-08-14 02:06:40

either the pixels get smaller or they stop using pixels

Hello71

2022-08-14 02:13:01

True, maybe we'll see wavelets come back :p

2022-08-14 02:21:44

Actually, I think a stronger argument for simpler codecs might be power savings, not picture quality. Right now it's worth it to use, say, 50% more power in exchange for 20% less bandwidth, but maybe that won't be the case in 2-3 decades. 1080p already exceeds eye resolution on smartphones at reasonable viewing distances; it's usually sent a bit short on frames and color depth, but those will probably improve.

_wb_

2022-08-23 08:47:23	some reflections on how to evaluate perceptual metrics: https://twitter.com/jonsneyers/status/1562175662127026176
2022-08-23 08:48:11	e.g. I now think this is not a good approach: http://compression.cc/tasks/#perceptual
2022-08-25 07:58:13	here are the correlation plots for ssimulacra2 as it is now in the pull request
2022-08-25 08:03:08	the above is for the full dataset of 22k distorted images originating from 250 originals. For tuning the weights, 201 images were used; 49 images were not used at all to tune the weights. Here are correlation plots for just the 4.3k distorted images originating from the validation set of those 49 images not 'seen' during weight tuning:
2022-08-29 02:07:43	https://github.com/cloudinary/ssimulacra2
2022-08-31 09:22:23
2022-08-31 09:23:28	The MCOS scores we collected are based on absolute MOS scores + relative Elo scores from pairwise comparisons.
2022-08-31 09:26:02	Metrics can be used in two ways: - Given two variants of an image (both derived from the same original), predict which one looks best. Absolute quality scores don't matter for this, it just needs to get the order right within variants of the same original. - Given a compressed image, predict how good it looks. This is an absolute quality score that can be compared across different original images.
2022-08-31 09:27:09	The interesting thing here is that at the first task, the metrics behave as follows: SSIMULACRA > DSSIM > Butteraugli > PSNR > VMAF > SSIM while for the second task (absolute quality across different images), they behave as follows: Butteraugli > DSSIM > VMAF > SSIMULACRA > SSIM > PSNR
2022-08-31 09:36:10	E.g. VMAF is doing reasonably well to predict MCOS scores (it's not bad as an 'absolute' metric), but as a 'relative' metric it's doing quite poorly: amongst cases where human preference is relatively clear, it predicts incorrectly 3% of the time while PSNR is wrong only 1.4% of the time
2022-08-31 09:40:46	so somewhat surprisingly, if you want to predict pairwise comparisons, PSNR is actually better than SSIM and VMAF, and not much worse than Butteraugli. DSSIM and SSIMULACRA do perform better than PSNR: in cases of relatively clear human preference, DSSIM is wrong 0.30% of the time and SSIMULACRA 0.43% of the time (SSIMULACRA 2 reduces that to 0.26% and also reduces the rate of 'human preference is clear but metric difference is small' and 'human preference is unclear but metric difference is large')
2022-08-31 09:42:20	for predicting absolute quality, PSNR is very bad though, and the best metrics are Butteraugli, DSSIM, and now SSIMULACRA 2
2022-08-31 12:49:23	LPIPS is not that great — then again it's a generic metric that isn't particularly designed for compression artifacts
2022-08-31 01:01:14	So on our dataset (medium to very high quality classical image compression, i.e. the mozjpeg q30-q95 range), it looks like the different perceptual metrics can be ranked like this: For relative quality (predicting pairwise comparisons within the same original image): SSIMULACRA2 >> SSIMULACRA ~= DSSIM >> Butteraugli (2-norm) > PSNR > LPIPS >> VMAF >> SSIM For absolute quality (predicting MOS scores, i.e. metric values are consistent across different original images): SSIMULACRA2 >> Butteraugli (2-norm) > DSSIM > VMAF >> LPIPS ~= SSIMULACRA >> SSIM >> PSNR (with the caveat that SSIMULACRA2 was tuned based on part of this data, and while the results do generalize quite well to the validation set, they may not generalize to different kinds of inputs — e.g. other codecs, different image dimensions, different gamut or dynamic range, etc)
2022-08-31 01:53:23	any other metrics I should try?
2022-08-31 01:54:10	currently computing FSIM but oh boy, either that is a very slow metric or I am using a very slow implementation of it. It will take a while...

veluca

2022-08-31 02:07:29

nlpd? grayscale only though

_wb_

2022-08-31 02:27:29	i'll give it a go
2022-08-31 02:29:01	why are pytorch and tensorflow so huge?

veluca

2022-08-31 02:37:07

that's a good question

Info

JPEG XL

General chat

Voice Channels

Archived

benchmarks