|
fab
|
2024-01-11 06:57:05
|
|
|
2024-01-11 06:57:26
|
Second set is SD
|
|
2024-01-11 06:59:18
|
Second set MULTI resolution
|
|
|
MSLP
|
2024-01-11 07:02:26
|
plz stop it fab
|
|
|
fab
|
|
MSLP
|
|
fab
This is HD now
|
|
2024-01-11 07:10:20
|
I like the tea-towel tho
|
|
|
fab
|
2024-01-11 07:12:04
|
I evaluated 30 tik tOk videos and 20 images
|
|
2024-01-11 07:23:23
|
Latest image
|
|
2024-01-11 07:23:46
|
|
|
2024-01-11 07:23:52
|
Original
|
|
2024-01-11 07:29:48
|
Send to everyone: Benchmarks studies Three resolutions
|
|
2024-01-11 07:30:31
|
Not in whatsapp only to compression enngineer
|
|
|
eddie.zato
|
2024-01-18 10:45:14
|
Generation loss is still fun <:CatSmile:805382488293244929>
|
|
|
_wb_
|
2024-01-18 11:17:48
|
what's the intermediate format you're using between generations? 8-bit png, 16-bit png, 32-bit pfm?
|
|
|
eddie.zato
|
2024-01-18 11:25:40
|
16-bit png
|
|
|
_wb_
|
2024-01-18 11:26:18
|
<@532010383041363969> I think generation loss is a good argument for constraining enc gaborish to be as close as possible to the inverse of dec gaborish, rather than allowing it to apply "anything that works" for a single generation and according to Butteraugli. If there are reasons to deviate from that (e.g. compensating for subsequent quantization errors) then that's fine with me, as long as the goal is still to make the roundtrip as close as possible to the identity and not introducing artifacts that improve metric scores in one generation but accumulate into something bad over several generations, i.e. the tuning could be done by optimizing for the result of not just one roundtrip but say the end result of 5 or 10 generations (yes, that makes tuning N times slower but I think it's worth it).
|
|
2024-01-18 11:29:32
|
Adding an option to benchmark_xl to do multiple generations of encoding would be useful. A change might fool the metrics in one generation and look good while it actually does introduce artifacts that when amplified through multi-generation encoding become problematic enough to no longer fool the metrics.
|
|
|
w
|
2024-01-18 11:35:00
|
what is the point of gaborish again
|
|
|
_wb_
|
2024-01-18 11:43:28
|
It's basically a way to get the advantages of a lapped transform without doing a lapped transform: encoder side it applies a kind of sharpening before doing DCT, decoder side it applies a kind of blurring after doing DCT that undoes the sharpening the encoder did and also reduces blocking and DCT artifacts.
|
|
|
|
afed
|
|
afed
very visible blocking in red (enc-dec-jpegli-bd16 compared to mozjpeg)
<https://unsplash.com/photos/eLHsbiCipc8>
resized from the original size to 1920x2688
jpegli jpeg q80 + png (with jpegli-bd16-dec)
|
|
2024-01-18 01:16:25
|
also jpegli block artifacts are still there in the latest version
something similar with generation loss blockiness
|
|
|
_wb_
|
2024-01-18 01:24:14
|
is this XYB jpegli or YCbCr? Either way, I wonder where the blockiness is coming from, I would assume this is something that should be avoidable...
|
|
|
|
afed
|
2024-01-18 01:24:56
|
both (but less for xyb)
|
|
|
Jyrki Alakuijala
|
2024-01-18 03:11:42
|
we need to improve on this in cjxl and cjpegli
|
|
2024-01-18 03:13:01
|
we will think what to do -- one possibility is to optimize the quantization heuristics exactly for the generation loss (as a mixed objective, partially for image quality, partially for generation loss)
|
|
2024-01-18 03:13:33
|
but I like the approach where we first acknowledge the problem and stay open to it for awhile before deciding how to react ๐
|
|
2024-01-18 03:13:45
|
do we have an open issue for this in the github?
|
|
|
Oleksii Matiash
|
|
Jyrki Alakuijala
we will think what to do -- one possibility is to optimize the quantization heuristics exactly for the generation loss (as a mixed objective, partially for image quality, partially for generation loss)
|
|
2024-01-18 03:53:49
|
I believe this (optimization for the generation loss) should be top priority given how often and how many times pictures are recompressed in the Internet
|
|
|
damian101
|
|
eddie.zato
Generation loss is still fun <:CatSmile:805382488293244929>
|
|
2024-01-18 05:29:12
|
why does cjpegli have worse generation loss than mozjpeg here?
|
|
|
Traneptora
|
2024-01-18 10:20:40
|
jxl is supposed to have convergent generation loss, iirc
|
|
|
Jyrki Alakuijala
|
|
why does cjpegli have worse generation loss than mozjpeg here?
|
|
2024-01-19 01:36:37
|
most likely because of the variable dead zone quantization
|
|
|
_wb_
|
2024-02-01 04:52:17
|
https://www.w3.org/Graphics/Color/Workshop/slides/talk/lilley this talk shows some nice plots of how uniform various spaces are w.r.t. DeltaE 2000
|
|
2024-02-01 04:52:25
|
|
|
2024-02-01 04:53:01
|
would be interesting to see what a plot like that looks like for XYB
|
|
|
yoochan
|
2024-02-01 05:57:05
|
On a similar subject i stumbled upon this fancy oklch color picker https://oklch.com/#70,0.1,2,90.17
|
|
|
_wb_
would be interesting to see what a plot like that looks like for XYB
|
|
2024-02-01 06:11:12
|
Perhaps there is some code which could be scavenged from https://bottosson.github.io/posts/oklab/ as he displays similar plots
|
|
|
monad
|
2024-02-05 05:02:29
|
What is the robust way to aggregate timings of a single command over different images? sum(pixels)/sum(seconds) has a downside when applied to timings below the precision of measurement, but is median(pixels)/median(seconds) better?
|
|
2024-02-05 05:04:14
|
Or rather median([pixels/seconds for each image])
|
|
|
Traneptora
|
|
monad
What is the robust way to aggregate timings of a single command over different images? sum(pixels)/sum(seconds) has a downside when applied to timings below the precision of measurement, but is median(pixels)/median(seconds) better?
|
|
2024-02-05 07:20:55
|
if you have the data for each one, why not just do pixels/second for each image, and then take the mean of those?
|
|
2024-02-05 07:22:50
|
total pixels / total seconds has the effect of making small images not count very much toward the average
|
|
2024-02-05 07:23:25
|
let's say I have a large image and a small iamge
|
|
2024-02-05 07:23:35
|
the large image takes 1 MP/s, and the small one is much faster. 10 MP/s
|
|
2024-02-05 07:23:52
|
if you total the pixels and divide by total time, that 10 MP/s will barely contribute to the average at all
|
|
|
_wb_
|
2024-02-05 07:43:37
|
then again that's probably OK, since the measurement for the small image will likely be less accurate and include more stuff that isn't asymptotically important (depending on how you run the benchmark, you might be counting things like loading the binary and initializing some stuff as part of the enc/dec time)
|
|
|
monad
|
|
Traneptora
if you have the data for each one, why not just do pixels/second for each image, and then take the mean of those?
|
|
2024-02-05 10:16:10
|
Okay, then I have to learn how to incorporate samples of zero seconds.
|
|
|
_wb_
|
2024-02-05 04:21:36
|
`--num_reps=100` ๐
|
|
|
monad
|
2024-02-05 05:16:49
|
while elaspsed_time < time_unit: _time command_
|
|
|
Traneptora
|
2024-02-05 05:19:49
|
num_reps makes more sense
|
|
2024-02-05 05:20:14
|
if your samples are too small then most of the overhead for a single execution is going to be unrelated to JXL
|
|
|
monad
|
2024-02-05 06:57:43
|
But it should be fair across encoders, and anyway the overhead is part of the practical implications of invoking the command. num_reps should be more representative of using the library directly.
|
|
|
_wb_
|
2024-02-05 07:24:04
|
In most use cases, you would use the library directly, not call cjxl/djxl on intermediate files...
|
|
|
monad
|
2024-02-05 07:41:28
|
I'm looking at transcoding existing files. Maybe it's useless to most people, but that's in line with everything else I make.
|
|
|
Traneptora
|
2024-02-05 08:22:54
|
are you producing benchmarks to measure solely how you use it? because the library is how most users will use it
|
|
|
_wb_
|
2024-02-07 08:46:13
|
The WebP/AVIF team has added a new benchmark tool to their page: https://storage.googleapis.com/demos.webmproject.org/webp/cmp/index.html
|
|
2024-02-07 08:47:45
|
They test at all qualities from 0 to 99 so results are a bit messy, especially the results they suggest to look at
|
|
2024-02-07 08:48:51
|
But it's a nice visualization tool that can be used to make relevant plots too, e.g. something like https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/index.html?load=all.json#WebP+method+4-quality=90..90&matcher_ssim=off
|
|
2024-02-07 08:51:18
|
If you just aggregate over the entire range from q0 to q99, you end up giving a _lot_ of weight to very low quality settings that no sane person would use.
|
|
2024-02-07 08:54:22
|
I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
|
|
|
jonnyawsom3
|
2024-02-07 09:05:34
|
So around q70 JXL seems to be at a better ratio than AVIF, with the gap significantly increasing after 80 and 90
|
|
|
_wb_
|
2024-02-07 09:14:42
|
Yes, well it also depends on what speed you look at, even at q30, jxl still gives better bang for buck if you want fast encoding.
|
|
2024-02-07 09:19:35
|
I'm also not sure if the set of images they used for testing is very representative, I should check the full corpus though (haven't found a convenient way to see all source images yet though). If you have few images with natural content like a human face, and many with hard straight diagonal lines (like modern, very 'clean' architecture), then AVIF will tend to look better since it's good at those kind of images.
|
|
|
HCrikki
|
2024-02-07 09:39:30
|
Would be helpful illustrating the kind of visual quality some of those numbers map to. I recall seeing in the past a low filesize output for avif that was of unacceptably low quality comparable to bit starved ancient jpeg
|
|
|
190n
|
|
_wb_
I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
|
|
2024-02-07 11:10:47
|
"good argument, unfortunately i have depicted AVIF as the soyjak and JXL as the chad"
|
|
|
Traneptora
|
|
190n
"good argument, unfortunately i have depicted AVIF as the soyjak and JXL as the chad"
|
|
2024-02-08 01:44:37
|
|
|
|
spider-mario
|
|
190n
|
2024-02-08 08:39:16
|
ofc once you get _really_ small, JXL starts winning again
|
|
|
yoochan
|
|
_wb_
I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
|
|
2024-02-08 08:52:36
|
how do you get the curve at webp q90 ? I struggle to select it. and in all cases avif seems to win... what a shitty presentation
|
|
|
|
veluca
|
2024-02-08 09:10:14
|
that's not the most usable UI I've seen in my life, no
|
|
|
_wb_
|
|
yoochan
how do you get the curve at webp q90 ? I struggle to select it. and in all cases avif seems to win... what a shitty presentation
|
|
2024-02-08 09:24:08
|
Easiest way imo is changing the url. https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/index.html?load=all.json#WebP+method+4-quality=90..90&matcher_ssim=off
|
|
|
yoochan
|
2024-02-08 09:24:47
|
thanks ๐
|
|
|
_wb_
|
2024-02-08 09:28:29
|
the "presets" the interface suggests are mostly based on comparisons at qualities like this: https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/visualizer.html?bimg=..%2Fclic_validation_2021_2022_2024%2Fimages%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.png&btxt=original&rimg=encoded%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif&rtxt=AVIF+speed+6&limg=encoded%2Fcd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e7q003.jxl.png<xt=JPEG+XL+effort+7
|
|
|
yoochan
|
2024-02-08 09:30:05
|
(your last link gives me an error)
|
|
|
_wb_
|
2024-02-08 09:30:22
|
that's a 10kb jxl and only a 6kb avif yet they have similar ssim and ssimulacra2 scores, so percentage-wise that image is a big win for avif
|
|
|
yoochan
(your last link gives me an error)
|
|
2024-02-08 09:30:57
|
strange, it works for me. what error?
|
|
|
yoochan
|
2024-02-08 09:31:37
|
'the image .... cannot be displayed because it contains errors' (under firefox)
|
|
2024-02-08 09:31:49
|
was it a jxl pic ?
|
|
2024-02-08 09:32:18
|
it might be an error of the jxl plugin
|
|
|
_wb_
|
2024-02-08 09:32:57
|
https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/encoded/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif
|
|
2024-02-08 09:33:05
|
no they show the jxl as a png
|
|
2024-02-08 09:33:16
|
only the avif is shown as an avif in that interface
|
|
2024-02-08 09:33:26
|
https://storage.googleapis.com/demos.webmproject.org/webp/cmp/clic_validation_2021_2022_2024/images/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.png is the original
|
|
|
yoochan
|
2024-02-08 09:34:38
|
the avif works ๐
|
|
2024-02-08 09:37:16
|
I'll do some benchmark_xl on this flower to understand better how a similar ssimulacra2 can be reached
|
|
|
_wb_
|
2024-02-08 09:37:53
|
anyway, if you compare mostly at such horrible qualities, avif is indeed good โ being a video codec, it is designed to do something not too bad looking even at very low bitrates. This is not a quality anyone would want to use for a still image though.
|
|
|
yoochan
|
2024-02-08 09:40:04
|
(benchmark_xl with "jxl:d1.0:glacier" returns me a core dump with commit ae50ce4b)
|
|
|
_wb_
anyway, if you compare mostly at such horrible qualities, avif is indeed good โ being a video codec, it is designed to do something not too bad looking even at very low bitrates. This is not a quality anyone would want to use for a still image though.
|
|
2024-02-08 09:40:55
|
I agree, at this resolution the image is ugly
|
|
|
|
veluca
|
|
yoochan
(benchmark_xl with "jxl:d1.0:glacier" returns me a core dump with commit ae50ce4b)
|
|
2024-02-08 09:44:21
|
file a bug?
|
|
|
yoochan
|
|
veluca
file a bug?
|
|
2024-02-08 09:44:59
|
I'm checking if I didn't messed something but I will
|
|
|
|
veluca
|
2024-02-08 09:45:16
|
thanks ๐
|
|
|
_wb_
|
2024-02-08 09:46:02
|
testing image codecs at such low qualities is like testing how well car can drive underwater โ sure, if you design for it, you can make a car drive underwater, and that's quite impressive and nice, but it still doesn't mean it's a very relevant thing to test for most people's needs
|
|
|
yoochan
|
2024-02-08 11:46:16
|
or off-roads ๐ i'm looking at you, sellers of urban 4WD with a snorkel
|
|
2024-02-08 11:46:35
|
```Encoding kPixels Bytes BPP E MP/s D MP/s Max norm SSIMULACRA2 PSNR pnorm BPP*pnorm QABPP Bugs
------------------------------------------------------------------------------------------------------------------------------------------------
jxl:d1.0:lightning 393 131272 2.6707357 6.189 20.943 1.34717049 87.27117148 41.28 0.59798734 1.597066130660 3.598 0
```
|
|
2024-02-08 11:47:02
|
among the results returned by benchmark_xl, is one of them a butteraugli score ?
|
|
|
_wb_
|
2024-02-08 11:59:41
|
max norm is butteraugli, pnorm is the 3-norm butteraugli
|
|
|
yoochan
|
2024-02-08 12:00:09
|
thank you ๐
|
|
2024-02-08 12:37:10
|
another small question in benchmark_xl, how to set the effort of avif ? `--codec=avif:q85:???`
|
|
|
|
veluca
|
2024-02-08 12:38:37
|
s0/.../s9
|
|
|
yoochan
|
|
damian101
|
|
_wb_
I made a little animation to illustrate what happens if you compare at something equivalent to WebP q30 vs WebP q95 (which is not super high quality, due to obligatory 4:2:0 and limited range YCbCr which makes it often worse than q90 JPEG).
|
|
2024-02-08 12:52:24
|
what metric?
|
|
|
_wb_
|
2024-02-08 12:53:24
|
here the comparison is at similar ssimulacra2 score
|
|
|
damian101
|
|
yoochan
|
2024-02-08 12:54:29
|
I'm trying a new way to plot scores which will enable real comparison at similar quality... Hope it will be more readable
|
|
2024-02-08 01:13:37
|
the q parameter seems to have no impact on the bpp with avif... shoundn't it represent quality in 0..100 ? in `--codec=avif:q85:s0`
|
|
|
|
veluca
|
2024-02-08 01:14:30
|
you need to have avif >= 1.0.3 for it to do anything
|
|
2024-02-08 01:14:35
|
and git libjxl
|
|
|
yoochan
|
2024-02-08 01:15:08
|
oki, I pulled libjxl with all the submodules, avif is not included ?
|
|
|
|
veluca
|
2024-02-08 01:15:16
|
(that's when they added a `--quality` flag to avifenc, AFAIU)
|
|
|
yoochan
|
2024-02-08 01:15:16
|
it is the one of my system ?
|
|
|
|
veluca
|
2024-02-08 01:15:24
|
yeah I am pretty sure it's the system one
|
|
|
yoochan
|
2024-02-08 01:15:41
|
oki, I'll try to derail it to a new version
|
|
2024-02-08 03:42:50
|
is there a way to tell the libjxl cmake to pick the libavif available in the LD_LIBRARY_PATH instead of the one found in /lib/ ... ?
|
|
2024-02-08 03:45:13
|
this trick worked for all projects I compiled up to now... that's strange
|
|
2024-02-08 03:46:50
|
perhaps setting the PKG_CONFIG_PATH will help ๐
|
|
|
|
veluca
|
|
yoochan
perhaps setting the PKG_CONFIG_PATH will help ๐
|
|
2024-02-08 03:49:10
|
yeah something like that should work
|
|
2024-02-08 03:49:26
|
(you also need to point it to the correct libavif at *compile time*)
|
|
|
yoochan
|
2024-02-08 03:50:20
|
make sense, but many projects link correctly only with the LD_LIBRARY_PATH, that's why I failed
|
|
2024-02-08 04:00:07
|
success !
|
|
|
_wb_
|
2024-02-10 03:22:31
|
The battle of the codecs goes on. I've got news from the Pareto front regarding lossless compression.
|
|
|
yoochan
|
2024-02-10 03:23:07
|
do you have a link for better quality ?
|
|
|
_wb_
|
2024-02-10 03:23:40
|
For smallish images (smaller than 4 Mpx) not too much changes between libjxl 0.9 and libjxl 0.10
|
|
|
yoochan
|
2024-02-10 03:25:00
|
0.10 is out !?
|
|
|
_wb_
|
|
yoochan
do you have a link for better quality ?
|
|
2024-02-10 03:25:01
|
I'll eventually clean up stuff and share the google sheets link, but the screenshot should be OK (if you open it in browser, not the discord previews)
|
|
|
yoochan
|
2024-02-10 03:25:26
|
indeed, my bad ๐ thanks for sharing
|
|
|
_wb_
|
2024-02-10 03:25:30
|
0.10 is not yet out but one of the main changes โ streaming encoding โ has been implemented
|
|
2024-02-10 03:26:05
|
For large images the difference will be very noticeable:
|
|
|
yoochan
|
|
_wb_
|
2024-02-10 03:28:34
|
basically libjxl 0.10 beats libjxl 0.9 by a big margin here. On these images, the new e7 is faster than the old e4 and also 0.5 bpp smaller on average.
|
|
2024-02-10 03:29:47
|
For large non-photographic images (this is some set of manga), the same thing is true:
|
|
|
yoochan
|
2024-02-10 03:29:57
|
what are the cyan plots very close from the purple ones ?
|
|
|
_wb_
|
2024-02-10 03:31:29
|
(zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
|
|
2024-02-10 03:33:09
|
the cyan is if you explicitly tell the encoder to use non-streaming mode โ it then behaves more like the 0.9 encoder behaves (still slightly faster). But the default would be the turquoise points.
|
|
2024-02-10 03:34:46
|
so what you can see here for these non-photographic images is that libjxl 0.9 was beating webp but not by a big margin. Now, libjxl 0.10 is beating webp more substantially.
|
|
2024-02-10 03:35:30
|
(for photo, jxl was already beating webp substantially but now it beats it even harder)
|
|
|
yoochan
|
2024-02-10 03:35:58
|
I didn't follow the subject, is there a flag to explicitely require non-streaming mode ?
|
|
|
w
|
2024-02-10 04:03:31
|
is encode speed real time or cpu time
|
|
2024-02-10 04:04:15
|
i always complain about this
|
|
|
spider-mario
|
2024-02-10 04:05:46
|
I assume it wouldnโt mention โ8 threadsโ on the y axis if it were cpu time
|
|
|
w
|
2024-02-10 04:06:56
|
well sometimes it scales differently
|
|
|
_wb_
|
2024-02-10 04:16:08
|
It's real time.
|
|
2024-02-10 04:18:40
|
Basically before, lossless encoding was mostly single-threaded and now it properly parallelizes. But it also became more efficient because it uses less memory and uses it more locally so the speedup is actually more than 8x
|
|
2024-02-10 04:18:57
|
(when running with 8 threads)
|
|
|
w
|
2024-02-10 04:19:23
|
is it possible to have a max memory option?
|
|
|
|
afed
|
|
_wb_
(zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
|
|
2024-02-10 04:19:35
|
also what about single threaded mode for all codecs?
i think that might also be a claim, because it's possible to encode multiple images in parallel
|
|
2024-02-10 04:21:39
|
and the opposite, on some even higher-core cpu, with more threads for encoders and just on very large images
|
|
|
lonjil
|
2024-02-10 04:25:30
|
Even though I don't think QOI and friends are particularly useful, it might be good to include them for when this is posted publicly.
|
|
|
spider-mario
|
|
w
well sometimes it scales differently
|
|
2024-02-10 04:31:50
|
right, but it would be kind of strange to have just this one graph then
|
|
2024-02-10 04:32:04
|
my inference wasnโt pure deduction
|
|
|
|
afed
|
|
lonjil
Even though I don't think QOI and friends are particularly useful, it might be good to include them for when this is posted publicly.
|
|
2024-02-10 04:32:27
|
then, instead of QOI it would be better to use fpnge or QOI just to compare that it's not a better format even compared to the old PNG (if use a different, faster encoder)
|
|
|
_wb_
|
2024-02-10 04:35:34
|
I used benchmark_xl, maybe we should add qoi and fpnge there...
|
|
|
lonjil
|
2024-02-10 04:36:46
|
It would be useful context, since a lot of people have heard about QOI and fpnge being very fast at their respective compression ratios.
Also, did someone make an improved version of QOI at some point? I feel like I saw something new in that area, but I don't recall what it was called.
|
|
|
|
afed
|
2024-02-10 04:37:35
|
<https://github.com/nigeltao/qoir>
|
|
|
lonjil
|
2024-02-10 04:39:24
|
thanks
|
|
2024-02-10 04:40:43
|
While looking I came across this https://github.com/jido/seqoia
|
|
|
|
afed
|
2024-02-10 04:42:06
|
yeah, there are many forks with some changes
|
|
|
_wb_
|
2024-02-10 05:09:29
|
Maybe I will just manually measure some of these
|
|
2024-02-10 05:10:09
|
At the other end, you could also add various png optimizers (this is just default libpng)
|
|
|
|
afed
|
2024-02-10 05:18:15
|
but, optimizers can usually only optimize pngs, so it's not a very fair comparison
or also it's basically the combined time spent on the first png encoding and then the time on optimization
for qoi forks I don't think it's useful, those are very task specific codecs
|
|
|
jonnyawsom3
|
|
_wb_
(zooming in a bit on the left part of that previous plot, since avif is clearly very far from the Pareto front)
|
|
2024-02-10 05:50:30
|
Was the new e1 hitting 130 MP/s? May be worth having a value on the axis where the highest result is
|
|
|
_wb_
|
2024-02-10 06:21:53
|
I got about 380 MPx/s for e1, didn't measure accurately though (just one encode per image)
|
|
2024-02-10 06:22:34
|
this is on my laptop, a macbook pro M3
|
|
|
jonnyawsom3
|
2024-02-10 06:25:38
|
Riight, I see now. Each 'block' on the chart is an order of magnitude increase with the intermediaries at 1/10ths
|
|
|
Traneptora
|
|
_wb_
Basically before, lossless encoding was mostly single-threaded and now it properly parallelizes. But it also became more efficient because it uses less memory and uses it more locally so the speedup is actually more than 8x
|
|
2024-02-10 06:54:51
|
isn't threadless encode speed more relevant? because batch converting can be done via processes etc.
|
|
|
lonjil
|
2024-02-10 07:02:04
|
Both are relevant. Especially the higher speeds are probably more relevant to "live" encoding than bulk encoding.
|
|
|
Traneptora
|
|
_wb_
|
2024-02-10 07:12:27
|
Depends on the use case. For saving an image in an editor, or maybe in a camera, threaded makes sense. For batch encoding, single-threaded is indeed more relevant.
|
|
|
Riight, I see now. Each 'block' on the chart is an order of magnitude increase with the intermediaries at 1/10ths
|
|
2024-02-10 07:13:43
|
Yeah you need a log scale for speed since otherwise anything slower than e1 just is at the bottom of the plot ๐
|
|
|
|
afed
|
|
_wb_
For large images the difference will be very noticeable:
|
|
2024-02-10 07:15:22
|
for webp it would be better to use
`-z <int> ............... activates lossless preset with given level in [0:fast, ..., 9:slowest]`
which enables lossless automatically and has a wider range
and `-z 0` is also pretty balanced
but, dont know if benchmark_xl has this
|
|
|
_wb_
|
2024-02-10 07:40:21
|
It only has m1-6. I guess we have webp experts in the room here. Does it make a difference to configure it with -z instead of -m? Besides having more steps of effort (but if that's the only difference, I don't really care, the range between m1 and m6 is not that huge anyway, compared to the range of efforts you have in libjxl and libaom)
|
|
|
username
|
|
_wb_
It only has m1-6. I guess we have webp experts in the room here. Does it make a difference to configure it with -z instead of -m? Besides having more steps of effort (but if that's the only difference, I don't really care, the range between m1 and m6 is not that huge anyway, compared to the range of efforts you have in libjxl and libaom)
|
|
2024-02-10 08:01:20
|
I'm not really exactly sure though after looking around it seems like this is how mapping between the two is done. only problem is I don't really know how to read code that well but hope this helps either way!
|
|
2024-02-10 08:03:20
|
oh I was looking at it wrong and just realized the table/array or whatever it's called changes both `-m` and `-q` based on `-z` from a preset selection (I thought it was doing some kinda weird scaling of the values because I didn't realize it had a value set for each of the 10/9 levels because my brain counted it wrong)
|
|
2024-02-10 08:14:51
|
also cwebp is weird because `-q`/quality doesn't always mean visual quality since in the case of lossless it directly relates to effort. I know that `-m 6` with `-q 99` won't activate the brute force "lossless cruncher" but `-m 6` with `-q 100` will
|
|
2024-02-10 08:20:19
|
there's also the whole thing of `-mt`(multithreading) being off by default which makes sense if you plan to mass encode WebPs with each core working on one image however in almost all other cases it doesn't make sense, also iirc cwebp can only use up to like 2 threads and it only spins up the second thread for specific things with the most impactful thing being the lossless cruncher which outputs the same result but in almost half the time with the second thread
|
|
|
Orum
|
|
username
also cwebp is weird because `-q`/quality doesn't always mean visual quality since in the case of lossless it directly relates to effort. I know that `-m 6` with `-q 99` won't activate the brute force "lossless cruncher" but `-m 6` with `-q 100` will
|
|
2024-02-10 08:41:35
|
does `-z 9` set both `-m 6` and `-q 100` though?
|
|
|
username
|
|
Orum
does `-z 9` set both `-m 6` and `-q 100` though?
|
|
2024-02-10 08:48:12
|
yes, and the image in my first message shows what each level of `-z` hooks up to
|
|
|
Orum
|
2024-02-10 08:49:08
|
ohh, I see
|
|
2024-02-10 08:51:21
|
also `-mt` is of limited use in `cwebp`
|
|
|
username
|
2024-02-10 08:52:26
|
it is but It makes a big difference with `-z 9` since the lossless cruncher can be multithreaded
|
|
|
Orum
|
2024-02-10 08:52:39
|
yeah, but it *only* helps with `-z 9`
|
|
2024-02-10 08:54:32
|
...and even then, only 2 threads, no more
|
|
|
username
|
2024-02-10 09:18:39
|
some stages of "analysis" can get done on the second thread but uhhh I'm not sure exactly how much of a difference it makes in the end and I don't know enough about cwebp to know when it even happens during encoding
|
|
|
Orum
|
2024-02-10 09:49:59
|
honestly the most annoying part of cwebp is that you can't count on higher `-z` levels being the same size or smaller
|
|
2024-02-10 09:50:43
|
I have images where `-z 3` is smaller than all higher levels <:SadOrange:806131742636507177>
|
|
|
_wb_
|
2024-02-10 10:00:44
|
This can happen also in libjxl. It's quite hard to avoid โ unless you make each higher effort also try all lower efforts.
|
|
|
jonnyawsom3
|
|
username
it is but It makes a big difference with `-z 9` since the lossless cruncher can be multithreaded
|
|
2024-02-10 10:03:16
|
Similar to e10 using 100% CPU on brute force
|
|
|
Orum
|
|
_wb_
This can happen also in libjxl. It's quite hard to avoid โ unless you make each higher effort also try all lower efforts.
|
|
2024-02-10 10:58:15
|
well I think it's mostly an issue of "how often and how bad" than anything
|
|
2024-02-10 10:59:14
|
if it only occurs rarely (like, 1 in 1000) and it's not bad when it does occur (< 1% savings) it's not so much of an issue
|
|
2024-02-10 11:00:05
|
once 0.10 comes out I'm going to be doing a lot of benchmarks though, as I've got a lot of images I'd like to move over to <:JXL:805850130203934781>
|
|
|
jonnyawsom3
|
2024-02-10 11:20:36
|
The best way is to find out why it does so on higher efforts, such as this case https://discord.com/channels/794206087879852103/804324493420920833/1196974617982685318
|
|
2024-02-10 11:21:15
|
Halved at g2 and then doubled at g3
|
|
|
_wb_
|
2024-02-11 10:46:42
|
|
|
2024-02-11 10:51:40
|
Pareto front plot for lossy, where I adjusted the quality knob for each encoder+effort until the corpus average ssimulacra2 score was as close as possible to 85
|
|
2024-02-11 10:55:39
|
So basically avif s1 has the same compression performance as jxl e4 but is 100 times slower, and avif s0 more or less matches jxl e6 in compression and is also 100 times slower
|
|
2024-02-11 10:58:36
|
At reasonable speeds (avif s6+), it's a clear win for jxl
|
|
2024-02-11 11:00:28
|
Note also that lossy webp is obsoleted by jpegli, which is faster than any webp effort and also better than any webp effort.
|
|
2024-02-11 11:03:42
|
(this is for a specific quality point though, for lower quality it may be different)
|
|
|
yoochan
|
2024-02-11 11:08:34
|
Thank you for the graphs! I find it not easy to find convincing and unbiased representations. Where the quality / speed / size tradeoff should be plotted in 3d ๐
|
|
|
spider-mario
|
2024-02-11 11:34:52
|
is that xyb jpegli?
|
|
|
MSLP
|
2024-02-11 11:39:22
|
where on that charts mozjpeg would roughly be?
|
|
|
_wb_
|
|
spider-mario
is that xyb jpegli?
|
|
2024-02-11 12:13:04
|
I dunno, what does benchmark_xl do by default when you select jpeg:enc_jpegli?
|
|
|
MSLP
where on that charts mozjpeg would roughly be?
|
|
2024-02-11 12:14:30
|
I could test but for high quality it didn't improve much on libjpeg-turbo in previous experiments, just a lot slower...
|
|
|
Traneptora
|
|
_wb_
|
|
2024-02-11 01:28:16
|
what's the x axis? bpp?
|
|
|
_wb_
|
2024-02-11 01:28:31
|
oops I cut the label, yes, bpp
|
|
2024-02-11 01:29:21
|
also I mislabeled some of the avif points
|
|
2024-02-11 01:30:32
|
at s0, I could get away with q76 where at s9 it needed q82 to get the same target corpus-avg ssimulacra2 score of 85
|
|
2024-02-11 01:36:37
|
(jxl at e3/e4 seems to produce slightly higher ssimulacra2 scores than at higher effort settings when both are set to d1, but starting at e5 it's relatively consistent)
|
|
|
Traneptora
|
2024-02-11 01:37:18
|
less consistent quality, makes sense tbh
|
|
|
_wb_
|
2024-02-11 01:49:48
|
i didn't try to measure consistency of quality _across_ images here, this is just aligned on the corpus average ssimu2 score, which actually favors inconsistent codecs like webp and avif since basically while their average score is the same as jxl's (around 85 here), they reach it by getting a score of 90+ on the easy images and a much lower score on the hard images, so they save lots of bytes on the hard images (by not delivering the desired quality)
|
|
2024-02-11 01:57:22
|
oops again, those s0 points are not for the exact same corpus (accidentally included some other images)
|
|
2024-02-11 01:57:51
|
here's the correct plot
|
|
2024-02-11 02:00:12
|
same info in table form:
|
|
2024-02-11 02:03:50
|
table also shows decode speed and the actual metric scores โ the q-settings were chosen to get close to ssimulacra2=85 but there's of course still some variation (in jxl in principle I could get rid of that by tweaking the distance, in most others the q-settings have integer steps so you can't get arbitrarily close)
|
|
2024-02-11 02:12:37
|
this is using 8 threads on a macbook M3, and speeds were not measured accurately (just a single iteration)
|
|
2024-02-11 02:13:32
|
(since I'm mostly interested in orders of magnitude, measuring very accurately is not really needed imo)
|
|
|
spider-mario
|
|
_wb_
i didn't try to measure consistency of quality _across_ images here, this is just aligned on the corpus average ssimu2 score, which actually favors inconsistent codecs like webp and avif since basically while their average score is the same as jxl's (around 85 here), they reach it by getting a score of 90+ on the easy images and a much lower score on the hard images, so they save lots of bytes on the hard images (by not delivering the desired quality)
|
|
2024-02-11 02:15:20
|
so, basically, we have:
- same quality setting for all images, picking the one that yields the desired ssimulacra2 score on average (favours inconsistent codecs)
but in principle, we could also do one of these:
- individual quality setting for each image, so that each image has the desired ssimulacra2 score (corrects for inconsistent codecs)
- same quality setting for all images, but pick such that the _worst_ ssimulacra2 is the desired one (_penalises_ inconsistent codecs)
|
|
|
_wb_
|
2024-02-11 02:15:41
|
I don't know how the chrome team was benchmarking that avif decode speed was so great compared to jxl decode speed, but the results I'm getting seem to show something different
|
|
|
spider-mario
so, basically, we have:
- same quality setting for all images, picking the one that yields the desired ssimulacra2 score on average (favours inconsistent codecs)
but in principle, we could also do one of these:
- individual quality setting for each image, so that each image has the desired ssimulacra2 score (corrects for inconsistent codecs)
- same quality setting for all images, but pick such that the _worst_ ssimulacra2 is the desired one (_penalises_ inconsistent codecs)
|
|
2024-02-11 02:19:19
|
That's right. Setting it individually per image is probably the most fair thing to do, but it's not a very realistic/typical way in which people use codecs. Often a single setting is determined based on looking what it does to a few images, and then that setting is used for all images.
So the third thing (aligning on worst case, or let's say on p10), which penalizes inconsistent codecs, is actually the most relevant thing imo, but here I'll do the first thing (aligning on average, which favors inconsistent codecs) just to show that even when trying to favor avif, it still doesn't look good ๐
|
|
2024-02-11 02:22:27
|
I'll try to make such plots also for some "medium quality" point (averaging at ssimulacra2=70) and one for "camera quality" (something like d0.5).
|
|
|
yoochan
|
|
_wb_
same info in table form:
|
|
2024-02-12 08:35:39
|
can you select MT or not for avif from benchmark_xl ?
|
|
|
_wb_
|
2024-02-12 09:05:10
|
yes, add something like `:log2_cols=2:log2_rows=2` to make it MT
|
|
2024-02-12 09:06:15
|
avif is a bit funky: you have to manually specify how it does tiling, and you need to encode MT in order to be able to decode MT
|
|
2024-02-12 09:08:13
|
I'm assuming these two parameters set like that mean that it splits the image in tiles such that there are 2^log2_cols tiles horizontally and 2^log2_rows tiles vertically, so in 16 parts which should be enough when doing MT with 8 threads
|
|
|
|
veluca
|
2024-02-12 09:08:23
|
and it makes quality/density worse of course
|
|
|
yoochan
|
|
_wb_
yes, add something like `:log2_cols=2:log2_rows=2` to make it MT
|
|
2024-02-12 09:09:15
|
Thanks ๐
i couldn't have guessed
|
|
|
_wb_
|
2024-02-12 09:28:27
|
|
|
2024-02-12 09:31:06
|
I do kind of like the calibration of this new avif quality scale, avif q50 (at e0) is actually a quality that I would call "medium quality", and it corresponds to jxl d2.6, webp q76, libjpeg-turbo q70
|
|
|
spider-mario
|
2024-02-12 09:37:28
|
oh, so if one is fine with needing about a minute instead of 10 seconds for a Facebook-size medium-quality image, AVIF is actually kind of competitive
|
|
|
_wb_
|
2024-02-12 09:39:04
|
yes, at the lower qualities avif is competitive in compression, if you have the cpu time to throw at it
|
|
2024-02-12 09:40:25
|
I dunno about others but for Cloudinary anything slower than default effort avif (s6) is too slow โ the jump from s6 to s5 is quite steep, too
|
|
|
yoochan
|
|
_wb_
yes, at the lower qualities avif is competitive in compression, if you have the cpu time to throw at it
|
|
2024-02-12 10:02:17
|
At lower qualities, avif outperform jxl for ssimulacra2 scores. But does it in subjective tests too? For ssimulacra2 scores in 60-70
|
|
|
_wb_
|
|
yoochan
At lower qualities, avif outperform jxl for ssimulacra2 scores. But does it in subjective tests too? For ssimulacra2 scores in 60-70
|
|
2024-02-12 10:04:01
|
if you're prepared to go to the extremely slow speeds, yes (we don't have a ton of data about these very slow speed settings since they're unusable to us, but the limited data we have does show that they are also subjectively indeed better)
|
|
|
yoochan
|
2024-02-12 10:05:50
|
But why! ๐ญ avif also uses some kind of dct based compression
|
|
|
_wb_
|
2024-02-12 10:07:59
|
basically the situation is like this:
- if you are fine with using a LOT of encode cpu time, **and** you want relatively low quality, **and** you don't need progressive decode, then AVIF can save you a few percent compared to JPEG XL
- in all other cases, JPEG XL matches or beats AVIF (by a big margin if you want reasonable speed **or** better than mediocre quality)
|
|
|
damian101
|
2024-02-12 07:10:24
|
Here is how I encode AVIF for highest efficiency at very high qualit (at the expense of threading due to row-mt 0):
`avifenc -d 10 -a tune=ssim -a quant-b-adapt=1 -a enable-chroma-deltaq=1 -a deltaq-mode=2 -j 8 -a row-mt=0 -s 0 -a cq-level=14 --cicp 1/2/1`
|
|
|
yoochan
|
2024-02-12 07:17:01
|
What are the ssimulacra2 scores and the compression ratios for this?
|
|
|
damian101
|
2024-02-12 07:36:37
|
No idea...
Deltaq-mode 2 is actually usually slightly disliked by ssimulacra2, btw, unless it isn't, overall it definitely increases consistency, almost as well as deltaq-mode 3, but without making things blurry (which it definitely does together with tune ssim).
But quality consistency is the big issue with AVIF, and the one thing where JXL is always straight up superior. When doing automatic conversion to AVIF, you ideally want to do target quality encoding, performing multiple encodes and measuring the results. But when just comparing individual images, this holds up very well with JXL even at very high quality, but depends a lot on the specific content of course.
|
|
|
_wb_
|
2024-02-12 07:43:28
|
How long does it take to encode a 12 Mpx image with that setting? ๐
|
|
|
damian101
|
2024-02-12 07:44:01
|
well, time to find out I guess
|
|
|
Traneptora
|
2024-02-12 07:46:59
|
Why tune ssim?
|
|
2024-02-12 07:47:22
|
Also why cicp 1/2/1?
|
|
2024-02-12 07:48:06
|
That's bt709 matrix, unspecified primaries, bt709 trc
|
|
|
damian101
|
|
Traneptora
Why tune ssim?
|
|
2024-02-12 07:54:12
|
Always better.
|
|
|
Traneptora
Also why cicp 1/2/1?
|
|
2024-02-12 07:55:08
|
By default, avifenc uses bt.601 color matrix, which is dumb for content in sRGB/BT.709 gamut, as the BT.709 will just perform better.
|
|
|
Traneptora
That's bt709 matrix, unspecified primaries, bt709 trc
|
|
2024-02-12 07:55:24
|
no
|
|
2024-02-12 07:55:51
|
that's bt.709 primaries, unspecified trasnfer, bt.709 matrix
|
|
|
Quackdoc
|
|
Traneptora
Also why cicp 1/2/1?
|
|
2024-02-12 08:04:18
|
while with ffmpeg cicp is MC/CP/TC, with libavif it's CP/TC/MC
|
|
2024-02-12 08:06:39
|
because for people who work on image and video stuff, agreeing with other projects would probably be fatal, better to just die I guess
|
|
|
damian101
|
2024-02-12 08:07:37
|
Usually the order that's actually used during encoding is chosen. Or sometimes decoding.
|
|
2024-02-12 08:08:09
|
And unless tonemapping is done, gamut reduction happens first, color matrix last during encoding.
|
|
|
_wb_
|
2024-02-12 08:08:45
|
Does avifenc convert arbitrary input to that space or is the only conversion it does rgb2yuv?
|
|
|
spider-mario
|
2024-02-12 08:08:53
|
I thought it was the latter
|
|
2024-02-12 08:09:03
|
i.e. no gamut reduction, just information about what the pixel data mean
|
|
|
_wb_
|
2024-02-12 08:09:39
|
So if you give it P3 input you are reinterpreting it as sRGB?
|
|
|
spider-mario
|
2024-02-12 08:09:46
|
thatโs my understanding of it
|
|
|
damian101
|
|
_wb_
Does avifenc convert arbitrary input to that space or is the only conversion it does rgb2yuv?
|
|
2024-02-12 08:09:48
|
only the specified matrix influences encoding, the rest is just specification of color metadata
|
|
|
_wb_
|
2024-02-12 08:10:14
|
Metadata that can be wrong, if the input is not using those primaries...
|
|
|
spider-mario
|
2024-02-12 08:10:18
|
although if the input has an ICC profile, it will _also_ include it unless you specify `--ignore-icc`
|
|
|
_wb_
|
|
spider-mario
|
2024-02-12 08:10:52
|
Iโm not sure they specify which one takes precedence if both are present
|
|
2024-02-12 08:11:02
|
but to be on the safe side, whenever I specify --cicp, I also pass --ignore-icc
|
|
|
damian101
|
|
_wb_
Metadata that can be wrong, if the input is not using those primaries...
|
|
2024-02-12 08:11:17
|
Yes, that's why it normally shouldn't be specified.
Maybe I should make a feature request to ask if they can use an optimal color matrix by default for common color gamuts...
|
|
|
_wb_
|
2024-02-12 08:11:23
|
It's quite annoying how video folks basically just say "color is your problem, we just encode unknown sample data here"
|
|
|
spider-mario
|
2024-02-12 08:11:43
|
thereโs this joke that โNTSCโ stands for โnever the same colorโ
|
|
|
damian101
|
2024-02-12 08:12:08
|
well, avifenc does properly recognize and apply color metadata
|
|
2024-02-12 08:12:40
|
just always encoded in bt.601 color matrix by default
|
|
|
_wb_
|
2024-02-12 08:18:36
|
How do you indicate full range vs tv range?
|
|
|
|
afed
|
2024-02-12 08:19:24
|
`-r,--range RANGE : YUV range [limited or l, full or f]. (JPEG/PNG only, default: full; For y4m or stdin, range is retained)` <:Thonk:805904896879493180>
|
|
|
_wb_
|
2024-02-12 08:20:15
|
Ah ok it already defaults to full, that's good
|
|
|
damian101
|
2024-02-12 08:42:11
|
since when does cjxl not thread at effort 9 by default
|
|
2024-02-12 08:43:45
|
Encoding a 60MP image at effort 9 takes 200 seconds on a Ryzen 7800X3D...
|
|
|
190n
|
2024-02-12 08:45:36
|
how much ram
|
|
|
damian101
|
2024-02-12 08:45:44
|
effort 7 takes 4 seconds <:Thinkies:987903667388710962>
|
|
|
190n
how much ram
|
|
2024-02-12 08:45:48
|
64GB
|
|
|
190n
|
2024-02-12 08:46:10
|
that's how much it uses or how much you have?
|
|
|
damian101
|
|
190n
that's how much it uses or how much you have?
|
|
2024-02-12 08:46:23
|
how much I have
|
|
2024-02-12 08:46:28
|
it uses around 16GB or so
|
|
2024-02-12 08:46:46
|
effort 8 is single-threaded, too <:Thinkies:987903667388710962>
|
|
|
|
afed
|
2024-02-12 08:46:48
|
for lossy?
there is no real reason to use e9 for lossy
|
|
|
damian101
|
|
afed
for lossy?
there is no real reason to use e9 for lossy
|
|
2024-02-12 08:47:16
|
let me have fun ๐
|
|
2024-02-12 08:47:48
|
effort 7: 4s
effort 8: 46s
effort 9: 200s
|
|
2024-02-12 08:48:12
|
and at least effort 8 and 9 don't thread on this 60MP PNG image!!
|
|
|
|
afed
|
2024-02-12 08:52:26
|
5-6 is also typically not worse than 7 for the same size, but may not be as accurately hold a given butteraugli quality
|
|
|
damian101
|
2024-02-12 08:52:41
|
effort 7 is already absurdly fast
|
|
|
|
afed
|
2024-02-12 08:53:27
|
and e9 can be worse subjectively, though maybe that's already been fixed
|
|
|
damian101
|
2024-02-12 08:54:07
|
e8 and e9 target a signficantly different quality, I guess those are more consistent in quality?
|
|
2024-02-12 08:54:27
|
anyway, what's going in with the threading, this looks like a serious bug to me
|
|
|
_wb_
|
2024-02-12 08:56:54
|
Should be better in current git version
|
|
2024-02-12 08:57:25
|
But I don't think e8/e9 are that useful for lossy
|
|
|
damian101
|
|
_wb_
Should be better in current git version
|
|
2024-02-12 08:57:56
|
well, it's not like cjxl threads badly, it doesn't thread at all, exactly one thread the whole time...
|
|
|
|
afed
|
2024-02-12 08:58:09
|
not a bug, just some greedy methods are difficult to parallelize
|
|
|
damian101
|
2024-02-12 08:58:18
|
but it says it will use 16 threads, and it's a 16 thread machine
|
|
|
afed
not a bug, just some greedy methods are difficult to parallelize
|
|
2024-02-12 08:58:38
|
I am quite sure that effort 9 used to use more than 1 thread...
|
|
|
_wb_
|
2024-02-12 08:59:01
|
e8/e9 try too hard reaching a given butteraugli score imo. Optimizing too much for a metric is risky, no metric is perfect...
|
|
|
damian101
|
2024-02-12 08:59:43
|
Currently I'm easily beating cjxl effort 8 distance 1 with avifenc effort 3...
|
|
|
_wb_
|
2024-02-12 08:59:52
|
For me current git e9 does parallelize
|
|
|
damian101
|
2024-02-12 09:00:09
|
I'll send the source...
|
|
|
_wb_
|
|
_wb_
|
|
2024-02-12 09:00:45
|
I got almost 1 Mpx/s at lossy e9 with 8 threads
|
|
2024-02-12 09:00:53
|
Aaah wait
|
|
2024-02-12 09:01:02
|
You may have an image with lots of patches
|
|
2024-02-12 09:01:31
|
Patches can mess up speed by a lot
|
|
|
damian101
|
2024-02-12 09:05:02
|
what are patches...
|
|
2024-02-12 09:06:29
|
|
|
|
_wb_
|
2024-02-12 09:13:18
|
If that sky is solid white in some regions, patch heuristics will be wasting nonparallelized time on it
|
|
|
damian101
|
2024-02-12 09:13:19
|
efforts 8/9 do thread a little, actually, but only a fraction of the time...
|
|
|
_wb_
|
2024-02-12 09:13:36
|
Try with --patches 0 to see if that helps
|
|
|
damian101
|
|
_wb_
If that sky is solid white in some regions, patch heuristics will be wasting nonparallelized time on it
|
|
2024-02-12 09:13:43
|
I don't think it is, there is some visible noise throughout
|
|
|
_wb_
|
2024-02-12 09:14:09
|
Can't see it on my phone on that discord preview ๐
|
|
|
damian101
|
2024-02-12 09:14:41
|
well, I need to zoom in on my 4K monitor to see it, haha
|
|
|
_wb_
Try with --patches 0 to see if that helps
|
|
2024-02-12 09:16:51
|
definitely not signficantly
|
|
2024-02-12 09:17:24
|
btw, I just updated jxl, and now memory consumption is a lot lower <:Thinkies:987903667388710962>
|
|
2024-02-12 09:27:07
|
effort 7 threads decently well
|
|
2024-02-12 09:27:27
|
and effort 8 and 9 probably too, for the part that effort 7 does as well
|
|
|
|
afed
|
2024-02-12 09:50:05
|
if streaming mode for lossy in current git is still disabled for e7 and slower, then for e6 memory consumption will be even lower and most likely better multithreading
|
|
|
yoochan
|
2024-02-13 08:25:04
|
how would you get the version of avif used by benchmark_xl (the one linked to ?), ldd gives me access to a path, but I don't know the best way to extract a version id from this
|
|
|
_wb_
|
2024-02-13 08:47:48
|
`avifenc --version`, assuming your avifenc is linking to the same libavif
|
|
|
yoochan
|
2024-02-13 08:48:51
|
pas bรชte
|
|
|
_wb_
https://storage.googleapis.com/demos.webmproject.org/webp/cmp/2024_01_25/encoded/cd272a9d4ae2d9eabbe58474facc4da00f280be40137597bf1e497f459eda284.e6q016.avif
|
|
2024-02-13 10:07:12
|
I got a feedback from the author : https://github.com/webmproject/codec-compare/issues/3
|
|
|
_wb_
|
2024-02-13 10:36:07
|
"Not super crisp, but can be displayed as a background for example." โ if you really want a blurry mess to use as a background, just apply some gaussian blur (or some bilateral filter if you like to preserve edges) before encoding the image, and you'll see it encodes way better in any codec. Using image encoders to do the blurring for you is kind of silly.
|
|
|
yoochan
|
2024-02-13 10:38:14
|
agreed ๐ or resize the original picture if you don't plan to look at it
|
|
|
jonnyawsom3
|
2024-02-13 10:39:51
|
Or use JXL art ;P
|
|
|
_wb_
|
2024-02-13 10:40:53
|
Though maybe this is something we should just start doing at ridiculous distances (say, d > 5) โ first apply some heavy EPF and then encode (as opposed to encoding the input as-is and only doing EPF decode-side). Should produce "better looking" images even if it's obviously a bad idea in terms of fidelity.
|
|
|
jonnyawsom3
|
2024-02-13 10:42:11
|
Reminded me of my thought regarding low quality jpeg inputs. If EPF could be retroactively applied, but seemed out of scope at the time
|
|
|
yoochan
|
2024-02-13 10:42:27
|
you want a "cheat mode" optimized only for benchmark ? like WV ?
|
|
|
jonnyawsom3
|
2024-02-13 10:43:28
|
WV sounds like WB's evil twin, who sacrifices preservation to win at all costs
|
|
|
yoochan
|
2024-02-13 10:43:47
|
could be interesting though, activable only with a flag, you get agressive noise filtering (like bilateral) before encoding ๐
|
|
|
_wb_
|
2024-02-13 10:45:53
|
I consider d > 4 basically settings that are not relevant for real usage but only for benchmark/testing scenarios where people want to see what happens if you totally overcompress an image (and they somehow think this is useful information and says something about what happens at more reasonable qualities). So I don't really care if we do something funky at such distances.
|
|
|
jonnyawsom3
|
2024-02-13 10:49:31
|
If I recall the largest 256x256 VarDCT blocks aren't currently used yet either, although I don't know if they would have much of an effect
|
|
2024-02-13 10:53:17
|
(or rather 128x256 if I recall)
|
|
|
yoochan
|
|
_wb_
I consider d > 4 basically settings that are not relevant for real usage but only for benchmark/testing scenarios where people want to see what happens if you totally overcompress an image (and they somehow think this is useful information and says something about what happens at more reasonable qualities). So I don't really care if we do something funky at such distances.
|
|
2024-02-13 10:54:03
|
at which distance jxl and avif crosses ssimulacra scores ?
|
|
|
username
|
|
If I recall the largest 256x256 VarDCT blocks aren't currently used yet either, although I don't know if they would have much of an effect
|
|
2024-02-13 10:55:54
|
seems like anything above 64x64 isn't used by libjxl's encoder currently https://docs.google.com/presentation/d/1LlmUR0Uoh4dgT3DjanLjhlXrk_5W2nJBDqDAMbhe8v8/edit#slide=id.gad3f818ca8_0_20
|
|
|
jonnyawsom3
|
2024-02-13 11:01:08
|
Even more room for improvement then
|
|
|
|
veluca
|
2024-02-13 11:31:39
|
> I noticed that your second plot only has 2521 comparisons, which may signal low confidence in the results due to the lack of data points.
that's... an interesting statement
|
|
2024-02-13 11:32:16
|
> I tried to minimize any bias in the default settings:
> - The full input quality setting ranges are used.
|
|
2024-02-13 11:32:20
|
so is that
|
|
2024-02-13 11:33:52
|
> > I guess < 0.1bpp can not seriously be used for display
>
> From the <0.1bpp plot:
> Top left point: bpp < 0.03
> Right-most point: bpp < 0.03
> Not super crisp, but can be displayed as a background for example.
|
|
|
veluca
> > I guess < 0.1bpp can not seriously be used for display
>
> From the <0.1bpp plot:
> Top left point: bpp < 0.03
> Right-most point: bpp < 0.03
> Not super crisp, but can be displayed as a background for example.
|
|
2024-02-13 11:34:02
|
...
|
|
|
jonnyawsom3
|
|
veluca
> I tried to minimize any bias in the default settings:
> - The full input quality setting ranges are used.
|
|
2024-02-13 12:28:15
|
So, we just cap the maximum distance in libjxl and problem solved, we win all benchmarks, horray!
|
|
|
yoochan
|
2024-02-13 12:30:15
|
or, as suggested wb, we enable agressive epf after d 4.0 in order to pump up the SSIM score (at the expense of fidelity)
|
|
|
_wb_
|
2024-02-13 12:35:03
|
haha just capping distance at d4 would be fun โ it would mean there are just no data points to show for the very low qualities ๐
|
|
2024-02-13 12:37:13
|
btw avif and especially webp are benefitting from the other side of that phenomenon: for some images they just cannot reach a certain score so the comparison tool has no match to show for that image, and it doesn't count โ even if I would consider it a quite big win for jxl when this happens ๐
|
|
|
username
|
2024-02-13 12:38:23
|
if funky stuff is changed/done to libjxl at higher distances to improve SSIM then it would probably be a good idea to comment the new behavior/code with something like "// Improve SSIM" or something so it's easier to find in the future.
|
|
|
yoochan
|
2024-02-13 12:39:07
|
and put it behind a flag in the command line (or a flag at compile time too :D)
|
|
|
username
|
|
yoochan
and put it behind a flag in the command line (or a flag at compile time too :D)
|
|
2024-02-13 12:40:18
|
as in enabled or disabled by default?
|
|
2024-02-13 12:40:26
|
that reminds me
|
|
|
yoochan
|
2024-02-13 12:41:07
|
like : `cjxl --ssim-cheat -d 5.0 tutu.png` and also `cmake -DENABLE_SSIM_CHEAT`
|
|
|
username
|
|
username
that reminds me
|
|
2024-02-13 12:44:01
|
(took me a second to find this) reminds me of this toggle that exists in Nvida's GPU video encoder https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvenc-video-encoder-api-prog-guide/index.html#spatial-aq
|
|
|
username
(took me a second to find this) reminds me of this toggle that exists in Nvida's GPU video encoder https://docs.nvidia.com/video-technologies/video-codec-sdk/11.1/nvenc-video-encoder-api-prog-guide/index.html#spatial-aq
|
|
2024-02-13 12:44:57
|
"```
Although spatial AQ improves the perceptible visual quality of the encoded video, the required bit redistribution results in PSNR drop in most of the cases. Therefore, during PSNR-based evaluation, this feature should be turned off.
```"
|
|
|
monad
|
2024-02-13 01:31:10
|
libjxl uses butteraugli for internal tuning, not ssimulacra 2.1
|
|
|
yoochan
|
2024-02-13 01:32:19
|
that's what I'm writing, I'll invite him here, easier to discuss
|
|
|
username
|
|
yoochan
that's what I'm writing, I'll invite him here, easier to discuss
|
|
2024-02-13 02:11:34
|
maybe change it a bit so they aren't *required* to find the link on another website, A rewording something like this might be better?
"
"( link can be found here: [jpegxl.info](https://jpegxl.info/), or here's a direct link if you prefer: https://discord.gg/DqkQgDRTFu ) "
|
|
2024-02-13 02:12:16
|
oh it seems like they have seen your comment now
|
|
|
yoochan
|
|
username
oh it seems like they have seen your comment now
|
|
2024-02-13 02:15:25
|
that's what i did first but I didn't wanted the raw discord link to be published outside official pages... (this is a not very convincing argument, I'm not convinced myself :D)
|
|
|
username
|
2024-02-13 02:26:06
|
the link possibly getting picked up on github by bots or something I guess could be a worry. The thing is iirc the link has been posted into github comment sections before.
|
|
2024-02-13 02:27:55
|
github does let you edit comments but they have already seen it so who knows if they will look at the comment again and be convinced to join for discussion ๐คท
|
|
|
_wb_
|
2024-02-13 04:43:11
|
Interesting. For this set of not-so-large images (1 Mpx each), avif beats heic:
|
|
2024-02-13 04:44:12
|
While for this set of larger images (11 Mpx on average), heic beats avif:
|
|
|
damian101
|
2024-02-13 04:49:22
|
how...
|
|
|
|
afed
|
2024-02-13 04:53:50
|
if its x265 for heic it's not surprising, for higher qualities x265 is still better than any av1 encoder
and images usually require much higher quality than video
|
|
|
_wb_
|
2024-02-13 04:59:29
|
this is libheif which indeed uses x265 for heic
|
|
2024-02-13 05:02:15
|
and yes this is at ssimulacra around 85 (on average) which is a rather high quality, around d1
|
|
|
damian101
|
2024-02-13 05:04:39
|
maybe the nature of the large and small images is just different?
|
|
|
_wb_
|
2024-02-13 05:52:57
|
Sure, the large ones have more camera noise and less entropy per pixel, the small ones are downscales from high res photos so they have less noise but more entropy per pixel.
|
|
2024-02-13 05:58:58
|
Also I guess that Daala set was heavily used while designing Daala and av1, so maybe avif is performing a bit better than expected on that corpus ๐
|
|
|
yoochan
|
2024-02-13 06:00:12
|
The famous "i didn't test it on something else" bias
|
|
|
_wb_
Sure, the large ones have more camera noise and less entropy per pixel, the small ones are downscales from high res photos so they have less noise but more entropy per pixel.
|
|
2024-02-13 06:04:05
|
Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?
|
|
|
damian101
|
2024-02-13 06:09:16
|
it's complicated...
|
|
|
_wb_
|
2024-02-13 06:09:25
|
Agreed
|
|
|
damian101
|
|
yoochan
Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?
|
|
2024-02-13 06:13:27
|
You usually do not want to encode at resolutions higher than target resolution, because detail will be preserved and never displayed.
|
|
2024-02-13 06:14:05
|
wasting a lot of bitrate
|
|
2024-02-13 06:14:10
|
or file size, here
|
|
|
spider-mario
|
|
yoochan
Interesting... What would give the best result in term of size / quality ratio : reduce a photo by a factor of 2, encode it in high quality and display it as this. Or encode the original file, with a quality on par with the first test when viewed resized by a factor of 2?
|
|
2024-02-13 06:14:27
|
that tends to depend on the bitrate you target
|
|
|
damian101
|
2024-02-13 06:14:35
|
and the format
|
|
2024-02-13 06:14:54
|
and the downscaling method
|
|
|
_wb_
|
2024-02-13 06:15:50
|
Generally I wouldn't downscale as long as you want to remain in the "reasonable quality" range.
For web delivery of course you should downscale to fit the layout of the page (taking into account that css pixels might be more than 1 pixel, though)
|
|
|
yoochan
|
2024-02-15 08:54:35
|
๐ if distance is 0.01, 0.02 or 0.03 cjxl gives an image bigger than the original png. And a negative correlation between the effort and the compression
|
|
2024-02-15 08:54:46
|
is the distance rounded internally ? to how many decimals ?
|
|
|
_wb_
|
2024-02-15 09:04:17
|
Distances below 0.05 probably don't make sense on 8-bit input, and probably a lot of encoder heuristics are not well-tuned for distances below 0.3 or so
|
|
|
yoochan
|
2024-02-15 09:09:29
|
thanks, I'll take 0.3 as a lower limit
|
|
|
_wb_
|
2024-02-15 09:19:10
|
0.1 should still be ok, but I wouldn't go lower
|
|
|
jonnyawsom3
|
2024-02-15 12:48:23
|
I only just thought, does streaming mode have any effect on e10? (e11 now)
Had the old idea of running it on a small image, saving the parameters it finds and then applying to other/larger similar images. Although that would require PRs for the parameter saving
|
|
|
MSLP
|
2024-02-15 04:18:06
|
I too like the idea of for example "cjxl -v -e 10" printing the selected best parameters
|
|
|
_wb_
|
2024-02-15 04:30:44
|
That would be nice but I guess we'll need some API for verbosity of libjxl โ currently libjxl doesn't print anything to stdout except when compile-time verbosity is set to something nonzero and then it just always prints stuff. Basically we don't want to ship (all) debug output strings in release builds, but we could still do something like having a runtime verbosity in addition to a compile-time verbosity, or something.
|
|
2024-02-15 04:32:52
|
The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.
|
|
|
|
afed
|
2024-02-15 04:33:57
|
maybe at least in benchmark_xl
|
|
|
Cacodemon345
|
|
_wb_
The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.
|
|
2024-02-16 06:40:29
|
Setting a custom logger function would be better. Application remains in full control.
|
|
|
yoochan
|
|
_wb_
The cleaner way would maybe to not let libjxl print stuff itself to stdout, but let it return some string as part of the return status or something.
|
|
2024-02-16 07:33:47
|
Or a dedicated structure
|
|
|
monad
|
|
I only just thought, does streaming mode have any effect on e10? (e11 now)
Had the old idea of running it on a small image, saving the parameters it finds and then applying to other/larger similar images. Although that would require PRs for the parameter saving
|
|
2024-02-16 04:41:06
|
Most combinations of settings are useless, meaning e11 does a lot of work for nothing. It can also find obscure configurations (particularly specific predictors) which work for a single image, but don't generalize. Larger images prefer g3, smaller images less so. Patches are generally bad.
|
|
|
fab
|
|
|
|
2024-02-19 02:51:36
|
this is not the type of look i did pioirirrty on my phone
|
|
|
VcSaJen
|
2024-02-21 12:36:38
|
I wonder about how things have evolved over the years 2021-2024. Is AVIF getting closer to JPEG XL? Or JPEG XL further increased distance?
|
|
|
jonnyawsom3
|
2024-02-21 09:41:37
|
Did a few quick tests on an 8K screenshot with/without streaming mode on a Ryzen 7 1700
Starting with e1, performance was the same between 0.9.1 and 0.10.0
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 34592.2 kB (8.341 bpp).
7680 x 4320, 103.346 MP/s [103.35, 103.35], 1 reps, 16 threads.
PeakWorkingSetSize: 324.6 MiB
PeakPagefileUsage: 416.6 MiB
Wall time: 0 days, 00:00:00.423 (0.42 seconds)
User time: 0 days, 00:00:00.109 (0.11 seconds)
Kernel time: 0 days, 00:00:02.093 (2.09 seconds)
```
But Streaming input practically halved memory usage
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 1]
Compressed to 35417.2 kB (8.540 bpp).
7680 x 4320, 102.867 MP/s [102.87, 102.87], 1 reps, 16 threads.
PeakWorkingSetSize: 230.4 MiB
PeakPagefileUsage: 227.3 MiB
Wall time: 0 days, 00:00:00.349 (0.35 seconds)
User time: 0 days, 00:00:00.156 (0.16 seconds)
Kernel time: 0 days, 00:00:02.000 (2.00 seconds)
```
|
|
2024-02-21 09:44:24
|
Moving on to e3
```
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26824.0 kB (6.468 bpp).
7680 x 4320, 15.367 MP/s [15.37, 15.37], 1 reps, 16 threads.
PeakWorkingSetSize: 2.173 GiB
PeakPagefileUsage: 2.529 GiB
Wall time: 0 days, 00:00:02.258 (2.26 seconds)
User time: 0 days, 00:00:03.234 (3.23 seconds)
Kernel time: 0 days, 00:00:12.031 (12.03 seconds)
```
Now 0.10.0 makes a difference without Streaming input
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26714.8 kB (6.442 bpp).
7680 x 4320, 20.714 MP/s [20.71, 20.71], 1 reps, 16 threads.
PeakWorkingSetSize: 469 MiB
PeakPagefileUsage: 525.6 MiB
Wall time: 0 days, 00:00:01.704 (1.70 seconds)
User time: 0 days, 00:00:04.906 (4.91 seconds)
Kernel time: 0 days, 00:00:13.250 (13.25 seconds)
```
Streaming input makes roughly the same dent in memory usage as on e1
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26714.8 kB (6.442 bpp).
7680 x 4320, 20.687 MP/s [20.69, 20.69], 1 reps, 16 threads.
PeakWorkingSetSize: 373.9 MiB
PeakPagefileUsage: 333.9 MiB
Wall time: 0 days, 00:00:01.626 (1.63 seconds)
User time: 0 days, 00:00:04.687 (4.69 seconds)
Kernel time: 0 days, 00:00:12.406 (12.41 seconds)
```
|
|
2024-02-21 09:47:27
|
And now for e7, AKA default lossless
```
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 22487.1 kB (5.422 bpp).
7680 x 4320, 0.277 MP/s [0.28, 0.28], 1 reps, 16 threads.
PeakWorkingSetSize: 2.126 GiB
PeakPagefileUsage: 2.475 GiB
Wall time: 0 days, 00:01:59.703 (119.70 seconds)
User time: 0 days, 00:00:40.312 (40.31 seconds)
Kernel time: 0 days, 00:02:00.109 (120.11 seconds)
```
Much larger impact
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 22456.5 kB (5.415 bpp).
7680 x 4320, 4.288 MP/s [4.29, 4.29], 1 reps, 16 threads.
PeakWorkingSetSize: 470.8 MiB
PeakPagefileUsage: 519.6 MiB
Wall time: 0 days, 00:00:07.837 (7.84 seconds)
User time: 0 days, 00:00:04.375 (4.38 seconds)
Kernel time: 0 days, 00:01:34.843 (94.84 seconds)
```
Once again, Streamed input is a fixed reduction in memory
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 7]
Compressed to 22456.5 kB (5.415 bpp).
7680 x 4320, 4.217 MP/s [4.22, 4.22], 1 reps, 16 threads.
PeakWorkingSetSize: 377.7 MiB
PeakPagefileUsage: 337.5 MiB
Wall time: 0 days, 00:00:07.890 (7.89 seconds)
User time: 0 days, 00:00:04.843 (4.84 seconds)
Kernel time: 0 days, 00:01:34.171 (94.17 seconds)
```
|
|
|
|
veluca
|
2024-02-21 09:59:44
|
I'm a bit surprised that bitrate went down... not complaining ofc
|
|
2024-02-21 10:00:04
|
also surprised that you have a 10x reduction in *user* time
|
|
|
jonnyawsom3
|
2024-02-21 10:10:49
|
Running e9 at the moment, naturally taking half my lifetime on 0.9.1
|
|
2024-02-21 10:49:58
|
40 minutes on 0.9.1 for e9 (Almost 2500 seconds exactly)
```
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [Modular, lossless, effort: 9]
Compressed to 21333.5 kB (5.144 bpp).
7680 x 4320, 0.013 MP/s [0.01, 0.01], 1 reps, 16 threads.
PeakWorkingSetSize: 4.069 GiB
PeakPagefileUsage: 4.114 GiB
Wall time: 0 days, 00:41:39.986 (2499.99 seconds)
User time: 0 days, 00:07:54.734 (474.73 seconds)
Kernel time: 0 days, 00:35:54.125 (2154.12 seconds)
```
Naturally a massive improvement for 0.10.0
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 9]
Compressed to 21367.2 kB (5.152 bpp).
7680 x 4320, 0.209 MP/s [0.21, 0.21], 1 reps, 16 threads.
PeakWorkingSetSize: 551 MiB
PeakPagefileUsage: 602.7 MiB
Wall time: 0 days, 00:02:39.119 (159.12 seconds)
User time: 0 days, 00:00:08.218 (8.22 seconds)
Kernel time: 0 days, 00:33:38.203 (2018.20 seconds)
```
And Streaming input keeping it under half a GB of memory
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 9]
Compressed to 21367.2 kB (5.152 bpp).
7680 x 4320, 0.206 MP/s [0.21, 0.21], 1 reps, 16 threads.
PeakWorkingSetSize: 461.5 MiB
PeakPagefileUsage: 411.7 MiB
Wall time: 0 days, 00:02:40.763 (160.76 seconds)
User time: 0 days, 00:00:07.656 (7.66 seconds)
Kernel time: 0 days, 00:33:43.906 (2023.91 seconds)
```
|
|
|
veluca
I'm a bit surprised that bitrate went down... not complaining ofc
|
|
2024-02-21 10:50:41
|
Interestingly bitrate went back up on e9
|
|
|
|
veluca
|
2024-02-21 10:51:43
|
there's something very broken in the threading implementation on Windows, btw - those Wall/User times don't make any sense
|
|
|
Interestingly bitrate went back up on e9
|
|
2024-02-21 10:52:05
|
that I am less surprised about
|
|
|
jonnyawsom3
|
|
veluca
there's something very broken in the threading implementation on Windows, btw - those Wall/User times don't make any sense
|
|
2024-02-21 10:59:36
|
I got those times using this, so I wouldn't be surprised if they're inaccurate compared to native Linux
https://github.com/cbielow/wintime
|
|
2024-02-21 11:07:41
|
Here's the start and end times of 0.9 vs 0.10, removed from those results due to Discord's character limit
e1
```
0.9
Creation time 2024/02/21 09:12:46.207
Exit time 2024/02/21 09:12:46.831
0.10
Creation time 2024/02/21 09:29:14.263
Exit time 2024/02/21 09:29:14.687```
e3
```
0.9
Creation time 2024/02/21 09:14:50.419
Exit time 2024/02/21 09:14:52.677
0.10
Creation time 2024/02/21 09:15:02.020
Exit time 2024/02/21 09:15:03.724
```
e7
```
0.9
Creation time 2024/02/21 09:15:56.284
Exit time 2024/02/21 09:17:55.987
0.10
Creation time 2024/02/21 09:18:16.169
Exit time 2024/02/21 09:18:24.007
```
e9
```
0.9
Creation time 2024/02/21 09:48:07.953
Exit time 2024/02/21 10:29:47.940
0.10
Creation time 2024/02/21 10:35:39.433
Exit time 2024/02/21 10:38:18.553
```
|
|
2024-02-21 11:48:25
|
And a quick VarDCT test for good measure
```
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 4613.3 kB (1.112 bpp).
7680 x 4320, 1.325 MP/s [1.33, 1.33], 1 reps, 16 threads.
PageFaultCount: 1153320
PeakWorkingSetSize: 1.631 GiB
QuotaPeakPagedPoolUsage: 35.53 KiB
QuotaPeakNonPagedPoolUsage: 80.48 KiB
PeakPagefileUsage: 2.319 GiB
Creation time 2024/02/21 11:45:15.585
Exit time 2024/02/21 11:45:40.726
Wall time: 0 days, 00:00:25.141 (25.14 seconds)
User time: 0 days, 00:01:05.703 (65.70 seconds)
Kernel time: 0 days, 00:01:43.859 (103.86 seconds)
```
Slightly slower yet more efficent
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 4273.9 kB (1.031 bpp).
7680 x 4320, 1.193 MP/s [1.19, 1.19], 1 reps, 16 threads.
PageFaultCount: 949312
PeakWorkingSetSize: 463.8 MiB
QuotaPeakPagedPoolUsage: 35.23 KiB
QuotaPeakNonPagedPoolUsage: 20.45 KiB
PeakPagefileUsage: 569.3 MiB
Creation time 2024/02/21 11:45:56.165
Exit time 2024/02/21 11:46:24.065
Wall time: 0 days, 00:00:27.900 (27.90 seconds)
User time: 0 days, 00:01:19.593 (79.59 seconds)
Kernel time: 0 days, 00:01:53.531 (113.53 seconds)
```
And Streaming input once again keeping it under half a GB
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 4273.9 kB (1.031 bpp).
7680 x 4320, 1.160 MP/s [1.16, 1.16], 1 reps, 16 threads.
PageFaultCount: 1054309
PeakWorkingSetSize: 368.1 MiB
QuotaPeakPagedPoolUsage: 225.1 KiB
QuotaPeakNonPagedPoolUsage: 20.31 KiB
PeakPagefileUsage: 378 MiB
Creation time 2024/02/21 11:51:11.688
Exit time 2024/02/21 11:51:40.306
Wall time: 0 days, 00:00:28.618 (28.62 seconds)
User time: 0 days, 00:01:16.687 (76.69 seconds)
Kernel time: 0 days, 00:01:53.703 (113.70 seconds)
```
|
|
|
|
veluca
|
2024-02-21 12:01:25
|
I had already observed pretty bad scaling thread scaling from windows builds yesterday
|
|
2024-02-21 12:01:47
|
so this is not surprising
|
|
|
jonnyawsom3
|
2024-02-21 12:01:49
|
e9 lossy hit 8GB of RAM for around 20 seconds before I killed the process, streaming input resulted in this
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 9]
Compressed to 4272.0 kB (1.030 bpp).
7680 x 4320, 1.125 MP/s [1.13, 1.13], 1 reps, 16 threads.
PageFaultCount: 996673
PeakWorkingSetSize: 432.4 MiB
QuotaPeakPagedPoolUsage: 225.1 KiB
QuotaPeakNonPagedPoolUsage: 20.71 KiB
PeakPagefileUsage: 436.3 MiB
Creation time 2024/02/21 11:59:36.841
Exit time 2024/02/21 12:00:06.357
Wall time: 0 days, 00:00:29.515 (29.52 seconds)
User time: 0 days, 00:01:19.984 (79.98 seconds)
Kernel time: 0 days, 00:01:57.781 (117.78 seconds)
```
|
|
|
veluca
I had already observed pretty bad scaling thread scaling from windows builds yesterday
|
|
2024-02-21 12:02:31
|
In all lossless cases it actually scaled up to 100% CPU usage, although I assume you mean the speed/thread count increase
|
|
|
|
veluca
|
|
e9 lossy hit 8GB of RAM for around 20 seconds before I killed the process, streaming input resulted in this
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 9]
Compressed to 4272.0 kB (1.030 bpp).
7680 x 4320, 1.125 MP/s [1.13, 1.13], 1 reps, 16 threads.
PageFaultCount: 996673
PeakWorkingSetSize: 432.4 MiB
QuotaPeakPagedPoolUsage: 225.1 KiB
QuotaPeakNonPagedPoolUsage: 20.71 KiB
PeakPagefileUsage: 436.3 MiB
Creation time 2024/02/21 11:59:36.841
Exit time 2024/02/21 12:00:06.357
Wall time: 0 days, 00:00:29.515 (29.52 seconds)
User time: 0 days, 00:01:19.984 (79.98 seconds)
Kernel time: 0 days, 00:01:57.781 (117.78 seconds)
```
|
|
2024-02-21 12:02:43
|
... I am not sure why it worked at all
|
|
|
In all lossless cases it actually scaled up to 100% CPU usage, although I assume you mean the speed/thread count increase
|
|
2024-02-21 12:03:09
|
yup
|
|
2024-02-21 12:03:19
|
could just be that the speed measurement is broken, of course
|
|
|
jonnyawsom3
|
|
veluca
... I am not sure why it worked at all
|
|
2024-02-21 12:11:53
|
Well whatever happened, it seemed to work pretty well haha
|
|
2024-02-21 12:12:01
|
Ohh, wait...
|
|
2024-02-21 12:13:58
|
No, nevermind... I was thinking maybe patches and streaming disabling that, but streaming input is seperate so they're disabled anyway.... I think....
|
|
2024-02-21 01:16:56
|
I'm mostly looking forward to 0.10.0 getting into Squoosh, then I can test on my phone too without immediately hitting an out of memory error
|
|
|
Kremzli
|
2024-02-21 01:20:03
|
Do they even maintain that? It's got so many PR's
|
|
|
jonnyawsom3
|
2024-02-21 01:23:58
|
<@228116142185512960> might be able to shed some light on that
|
|
|
sklwmp
|
2024-02-22 04:45:06
|
from 2:30 to 30 seconds with cjxl v0.10.0, pretty significant improvements <:PepeOK:805388754545934396>
|
|
|
eddie.zato
|
2024-02-23 03:48:42
|
Ok. That's really impressive for the modular. <:Hypers:808826266060193874>
```
PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.jpg 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.jpg 0100j.jxl }).TotalSeconds } | ft
JPEG XL encoder v0.9.2 41b8cda [AVX2]
Encoding [JPEG, lossless transcode, effort: 7]
Compressed to 13369.8 kB including container
JPEG XL encoder v0.10.0 19bcd82 [AVX2]
Encoding [JPEG, lossless transcode, effort: 7]
Compressed to 13369.8 kB including container
v092_s v0100_s
------ -------
0,96 1,07
PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.png 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.png 0100j.jxl }).TotalSeconds } | ft
JPEG XL encoder v0.9.2 41b8cda [AVX2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 7017.0 kB (1.128 bpp).
5760 x 8640, 2.985 MP/s [2.99, 2.99], 1 reps, 16 threads.
JPEG XL encoder v0.10.0 19bcd82 [AVX2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 6963.2 kB (1.119 bpp).
5760 x 8640, 3.349 MP/s [3.35, 3.35], 1 reps, 16 threads.
v092_s v0100_s
------ -------
17,29 15,51
PS > [PSCustomObject]@{v092_s = (Measure-Command { v092/cjxl 0.png -m 1 -d 0 092j.jxl }).TotalSeconds; v0100_s = (Measure-Command { v0100/cjxl 0.png -m 1 -d 0 0100j.jxl }).TotalSeconds } | ft
JPEG XL encoder v0.9.2 41b8cda [AVX2]
Encoding [Modular, lossless, effort: 7]
Compressed to 36444.0 kB (5.858 bpp).
5760 x 8640, 0.423 MP/s [0.42, 0.42], 1 reps, 16 threads.
JPEG XL encoder v0.10.0 19bcd82 [AVX2]
Encoding [Modular, lossless, effort: 7]
Compressed to 36064.1 kB (5.797 bpp).
5760 x 8640, 15.967 MP/s [15.97, 15.97], 1 reps, 16 threads.
v092_s v0100_s
------ -------
118,39 3,92
```
|
|
|
Nyao-chan
|
|
Moving on to e3
```
JPEG XL encoder v0.9.1 b8ceae3 [AVX2,SSE4,SSSE3,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26824.0 kB (6.468 bpp).
7680 x 4320, 15.367 MP/s [15.37, 15.37], 1 reps, 16 threads.
PeakWorkingSetSize: 2.173 GiB
PeakPagefileUsage: 2.529 GiB
Wall time: 0 days, 00:00:02.258 (2.26 seconds)
User time: 0 days, 00:00:03.234 (3.23 seconds)
Kernel time: 0 days, 00:00:12.031 (12.03 seconds)
```
Now 0.10.0 makes a difference without Streaming input
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26714.8 kB (6.442 bpp).
7680 x 4320, 20.714 MP/s [20.71, 20.71], 1 reps, 16 threads.
PeakWorkingSetSize: 469 MiB
PeakPagefileUsage: 525.6 MiB
Wall time: 0 days, 00:00:01.704 (1.70 seconds)
User time: 0 days, 00:00:04.906 (4.91 seconds)
Kernel time: 0 days, 00:00:13.250 (13.25 seconds)
```
Streaming input makes roughly the same dent in memory usage as on e1
```
JPEG XL encoder v0.10.0 d990a37 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 3]
Compressed to 26714.8 kB (6.442 bpp).
7680 x 4320, 20.687 MP/s [20.69, 20.69], 1 reps, 16 threads.
PeakWorkingSetSize: 373.9 MiB
PeakPagefileUsage: 333.9 MiB
Wall time: 0 days, 00:00:01.626 (1.63 seconds)
User time: 0 days, 00:00:04.687 (4.69 seconds)
Kernel time: 0 days, 00:00:12.406 (12.41 seconds)
```
|
|
2024-02-23 04:07:26
|
how did you toggle streaming?
|
|
|
jonnyawsom3
|
|
Nyao-chan
how did you toggle streaming?
|
|
2024-02-23 04:08:08
|
Streaming input is seperate to streaming encoding
|
|
2024-02-23 04:08:40
|
Although I mostly used the previous release (0.9) to measure with/without streaming encoding
|
|
|
Nyao-chan
|
2024-02-23 04:10:11
|
But you have 2 runs for 0.10.0, with and without streaming. I thought there are no flags to toggle it
|
|
2024-02-23 04:10:57
|
or is streaming input turned off by default and the flag enables it?
|
|
|
_wb_
|
2024-02-23 04:24:52
|
you can do `cjxl --streaming_input` but it requires ppm input atm โ that will mostly impact memory though, not so much speed
|
|
|
jonnyawsom3
|
|
Nyao-chan
But you have 2 runs for 0.10.0, with and without streaming. I thought there are no flags to toggle it
|
|
2024-02-23 04:59:27
|
> Streaming input is seperate to streaming encoding
|
|
2024-02-23 05:00:24
|
All it does is feed the input image in a few pixels at a time, the only difference in those runs was around 200MB of RAM. The ***actual*** streaming was 0.9 versus either 0.10 runs
|
|
|
Orum
|
2024-02-27 09:23:07
|
so, it looks like `e 7` is the fastest cjxl speed that (on average) beats cwebp's `z 9`:
``` eal rbp rrt lvl cpu mem
1 cjxl v0.10.0 1 1.1997896 0.0371686 1 Inf 30657.19
2 cjxl v0.10.0 2 0.9755219 0.3651210 2 17.3375226 244023.45
3 cjxl v0.10.0 3 1.0254203 0.5425397 3 19.2356366 243855.98
4 cjxl v0.10.0 4 0.8880205 1.1001987 4 21.2395977 274834.79
5 cjxl v0.10.0 5 0.8284132 1.1617858 5 20.9716307 267952.03
6 cjxl v0.10.0 6 0.8102338 1.4223768 6 21.1074809 274365.39
7 cjxl v0.10.0 7 0.7574005 1.8606330 7 21.2205659 283921.00
8 cjxl v0.10.0 8 0.7391769 5.4232358 8 22.1395608 536870.36
9 cjxl v0.10.0 9 0.7178132 24.2512006 9 22.4154769 633623.88
10 cjxl v0.10.0 10 0.7209015 26.4382343 10 22.3001730 634870.07
11 cwebp 1.3.2 0 1.0000000 1.0000000 0 0.9691077 151040.92
12 cwebp 1.3.2 1 0.8719663 7.1135803 1 0.9942246 208628.52
13 cwebp 1.3.2 2 0.8479628 8.1897254 2 0.9948469 221764.80
14 cwebp 1.3.2 3 0.8219334 10.8388722 3 0.9956001 224679.29
15 cwebp 1.3.2 4 0.8211225 11.3688304 4 0.9960652 223148.79
16 cwebp 1.3.2 5 0.8190413 10.9181859 5 0.9961888 221839.46
17 cwebp 1.3.2 6 0.8163690 14.8374114 6 0.9964598 219571.72
18 cwebp 1.3.2 7 0.8149141 17.9221753 7 0.9971695 218806.12
19 cwebp 1.3.2 8 0.8082165 23.0884559 8 0.9975708 262045.31
20 cwebp 1.3.2 9 0.7912807 74.0762189 9 1.8664396 519069.50```
|
|
2024-02-27 09:24:19
|
which is good, because after that memory use goes up massively in cjxl, but `e 7` is still comparable to `z 8` (in mem use)
|
|
|
|
veluca
|
2024-02-27 09:24:43
|
what images (and OS/CPU) are you using?
|
|
2024-02-27 09:25:02
|
in particular that "cpu" column looks a bit weird
|
|
2024-02-27 09:25:36
|
ah, it's a "number of CPUs used"
|
|
|
Orum
|
2024-02-27 09:25:38
|
these are lossless screenshots I've taken from various games, running linux on a 7950X3D
|
|
|
|
veluca
|
2024-02-27 09:25:38
|
ok, I see
|
|
|
Orum
|
2024-02-27 09:26:43
|
it's a bit weird that the rbp is higher for effort 3 than 2... <:WhatThe:806133036059197491>
|
|
|
|
veluca
|
2024-02-27 09:27:29
|
that's nonphoto for you
|
|
2024-02-27 09:27:45
|
-e3 is tuned for photographic content
|
|
|
Orum
|
2024-02-27 09:27:53
|
okay, fair enough
|
|
|
username
|
2024-02-27 09:28:22
|
did you run cwebp with or without `-mt`? because `-z 9` benefits speed wise with it and produces the same final output
|
|
|
Orum
|
2024-02-27 09:29:02
|
all of those are with `-mt` (you wouldn't see 1.8 CPU use with `-z 9` without it)
|
|
2024-02-27 09:31:29
|
it's kind of crazy how much faster `e 10` is compared to `z 9` though, but it shows the power of good MT <:YEP:808828808127971399>
|
|
2024-02-27 09:32:31
|
anyway, I'll have some graphs in a moment, and some other stats that I need to investigate manually
|
|
2024-02-27 09:35:18
|
yeah, `e 1` is so fast that `time` doesn't really provide enough resolution of real time used to get a useful measurement, but everything else looks usable:
|
|
2024-02-27 09:37:36
|
being too fast is a good problem to have though <:Hypers:808826266060193874>
|
|
2024-02-27 09:41:05
|
peak memory use looks fairly comparable at most levels too, though cjxl's data is much more tightly grouped at some levels
|
|
|
|
veluca
|
2024-02-27 09:46:14
|
you can use `--num_reps`
|
|
|
Orum
|
2024-02-27 09:48:53
|
that's a lot more effort to script (if I just use it for `e 1`) or hours of additional testing (if I don't); but I think we can all agree that `e 1` lossless is "pretty damn fast" and certainly blows the pants off of cwebp `z 0`:
|
|
2024-02-27 09:49:25
|
of course, still take the e 1 numbers there with a boulder of salt as several of the images had a real time of '0'
|
|
2024-02-27 09:51:47
|
really interesting how `e 9` on average is smaller than `e 10`, but I assume that's again because the presets are tuned to photo?
|
|
|
|
veluca
|
2024-02-27 09:52:21
|
I'm not sure actually...
|
|
2024-02-27 09:52:41
|
how similar are different regions of your images?
|
|
|
Orum
|
2024-02-27 09:53:34
|
uhhh, that's extremely tough to say...
|
|
2024-02-27 09:54:02
|
it's a rather eclectic collection of screenshots, from > 100 different games
|
|
|
Nyao-chan
|
|
Orum
so, it looks like `e 7` is the fastest cjxl speed that (on average) beats cwebp's `z 9`:
``` eal rbp rrt lvl cpu mem
1 cjxl v0.10.0 1 1.1997896 0.0371686 1 Inf 30657.19
2 cjxl v0.10.0 2 0.9755219 0.3651210 2 17.3375226 244023.45
3 cjxl v0.10.0 3 1.0254203 0.5425397 3 19.2356366 243855.98
4 cjxl v0.10.0 4 0.8880205 1.1001987 4 21.2395977 274834.79
5 cjxl v0.10.0 5 0.8284132 1.1617858 5 20.9716307 267952.03
6 cjxl v0.10.0 6 0.8102338 1.4223768 6 21.1074809 274365.39
7 cjxl v0.10.0 7 0.7574005 1.8606330 7 21.2205659 283921.00
8 cjxl v0.10.0 8 0.7391769 5.4232358 8 22.1395608 536870.36
9 cjxl v0.10.0 9 0.7178132 24.2512006 9 22.4154769 633623.88
10 cjxl v0.10.0 10 0.7209015 26.4382343 10 22.3001730 634870.07
11 cwebp 1.3.2 0 1.0000000 1.0000000 0 0.9691077 151040.92
12 cwebp 1.3.2 1 0.8719663 7.1135803 1 0.9942246 208628.52
13 cwebp 1.3.2 2 0.8479628 8.1897254 2 0.9948469 221764.80
14 cwebp 1.3.2 3 0.8219334 10.8388722 3 0.9956001 224679.29
15 cwebp 1.3.2 4 0.8211225 11.3688304 4 0.9960652 223148.79
16 cwebp 1.3.2 5 0.8190413 10.9181859 5 0.9961888 221839.46
17 cwebp 1.3.2 6 0.8163690 14.8374114 6 0.9964598 219571.72
18 cwebp 1.3.2 7 0.8149141 17.9221753 7 0.9971695 218806.12
19 cwebp 1.3.2 8 0.8082165 23.0884559 8 0.9975708 262045.31
20 cwebp 1.3.2 9 0.7912807 74.0762189 9 1.8664396 519069.50```
|
|
2024-02-27 10:30:23
|
how are you getting multi threading on `-e 10`?
|
|
|
Orum
|
2024-02-27 10:31:27
|
use both `--streaming_input` and `--streaming_output`
|
|
|
|
afed
|
2024-02-27 10:33:09
|
but it's basically `-e 9`
|
|
|
Nyao-chan
|
2024-02-27 10:33:36
|
that's kind of weird, since `-e 10` uses global optimisations, should it work with that?
|
|
|
Orum
|
2024-02-27 10:34:04
|
well in my case it was (usually) worse than e 9 ๐คทโโ๏ธ
|
|
2024-02-27 10:34:41
|
so maybe something isn't working properly?
|
|
|
Nyao-chan
|
2024-02-27 10:35:00
|
how much worse, in kB?
|
|
|
|
afed
|
2024-02-27 10:36:39
|
`-e 9` is `-e 10` without streaming, though there were some changes later on
|
|
|
Nyao-chan
|
2024-02-27 10:37:20
|
I've noticed it's worse by around 100B with `--patches 1` so I wonder if that's what you are seeing
|
|
|
Orum
|
|
Nyao-chan
how much worse, in kB?
|
|
2024-02-27 10:37:59
|
uhh, not sure exactly; I tend to look at differences in %, not KB
|
|
|
Nyao-chan
|
|
Nyao-chan
I've noticed it's worse by around 100B with `--patches 1` so I wonder if that's what you are seeing
|
|
2024-02-27 10:40:50
|
and that was with a 1500x2250 image.
on 10240x5760 it's 25 kB smaller (around 0.5%)
|
|
|
|
afed
|
|
afed
`-e 9` is `-e 10` without streaming, though there were some changes later on
|
|
2024-02-27 10:48:04
|
probably `--patches` are the difference
because not optimized for multithreading/streaming?
|
|
|
Nyao-chan
|
2024-02-27 10:51:19
|
and the variable predictor will also be off, already merged
|
|
|
|
afed
|
2024-02-27 10:53:16
|
yeah, so `-e 9` in v0.10.1 will be much faster, but somewhat worse
|
|
|
Orum
|
2024-02-27 10:58:54
|
how will it compare to `e 8` then?
|
|
|
|
afed
|
2024-02-27 10:59:42
|
https://github.com/libjxl/libjxl/issues/3323
|
|
|
Orum
|
2024-02-27 11:01:04
|
`num_threads 0` <:monkaMega:809252622900789269>
|
|
|
|
afed
|
2024-02-27 11:03:43
|
for threading is the same, because with streaming it's almost independent individual blocks encoding
but it's only for one manga image, so maybe for other images it will be different <:PepeSad:815718285877444619>
|
|
|
Orum
|
2024-02-27 11:03:47
|
anyway some separation between `e 9` and `e 10` is welcome as they're very close in speed right now
|
|
2024-02-27 11:04:54
|
as long as it doesn't overlap `e 8` too much <:KekDog:805390049033191445>
|
|
|
Nyao-chan
|
2024-02-27 11:05:28
|
I run everything single threaded because I thread per file. It's still faster
|
|
|
Orum
|
2024-02-27 11:05:46
|
the main reason I don't do that is memory use
|
|
|
Nyao-chan
|
|
afed
for threading is the same, because with streaming it's almost independent individual blocks encoding
but it's only for one manga image, so maybe for other images it will be different <:PepeSad:815718285877444619>
|
|
2024-02-27 11:05:54
|
If you have a good corpus, tell me
|
|
|
|
afed
|
|
Nyao-chan
I run everything single threaded because I thread per file. It's still faster
|
|
2024-02-27 11:06:30
|
yeah, I also rarely needs multithreading for a single image
|
|
|
Nyao-chan
|
|
Orum
the main reason I don't do that is memory use
|
|
2024-02-27 11:06:36
|
even with 3000x4500 I can encode 16 images on 32 GB memory. thoug you have many more threads
|
|
|
Orum
|
2024-02-27 11:07:03
|
I'm usually working with 8K images, and sometimes even larger than that
|
|
|
Nyao-chan
|
2024-02-27 11:07:38
|
Yeah, when I encoded 10k I think 4 was the limit
|
|
|
Quackdoc
|
|
Orum
the main reason I don't do that is memory use
|
|
2024-02-27 11:07:44
|
with the new memory gains this might not be an issue anymore lol
|
|
|
Orum
|
2024-02-27 11:09:10
|
well it's less of an issue, but honestly I can fully utilize my CPU with only 2 simultaneous <:JXL:805850130203934781> encodes now, so that has a bigger effect on reducing memory than all the other optimizations combined (compared to running ~16+ encodes in the past, which wasn't even possible without running out of RAM)
|
|
|
|
afed
|
|
afed
yeah, I also rarely needs multithreading for a single image
|
|
2024-02-27 11:11:58
|
and for the best compression streaming still has efficiency loss, except for few cases when some algorithms don't work properly for the whole image
|
|
2024-02-27 11:17:50
|
about `-e 3`, its improved with palette detection but can be still worse than `-e 2` for non photos
https://canary.discord.com/channels/794206087879852103/804324493420920833/1118243657456304189
|
|
2024-02-27 11:23:21
|
and it would be nice to have support for png streaming input, it's not that much gain as internal streaming, but still saves some more extra memory
haven't checked, but does jpeg have streaming input?
|
|
|
Nyao-chan
|
2024-02-27 11:26:11
|
not according to the help message about the flag. but idk
|
|
2024-02-27 11:29:32
|
doesn't it reorder blocks and recompress Huffman coding in jpeg? would it make sense to stream at all?
|
|
|
|
afed
|
2024-02-27 11:31:05
|
for lossy recompression at least
|
|
|
Orum
|
2024-02-27 12:38:37
|
if anyone is interested, here's the sizes of the images at all compression levels (this doesn't have the timing or memory usage data, so if you want that too just ask)
|
|
|
yoochan
|
2024-02-27 12:51:30
|
could you share the images ? or their url ?
|
|
|
Orum
|
2024-02-27 12:54:38
|
right now they're all just on my NAS but I can upload if you're interested
|
|
|
yoochan
|
2024-02-27 12:55:13
|
what's the weight of the initial bundle ?
|
|
|
Orum
|
2024-02-27 12:55:33
|
1.9 GiB RN
|
|
2024-02-27 12:55:54
|
...which will take me quite some time to upload with how slow my upload speed is <:monkaMega:809252622900789269>
|
|
|
yoochan
|
2024-02-27 12:57:21
|
yep, don't ๐ you screenshoted them yourself ?
|
|
|
|
veluca
|
2024-02-27 07:28:38
|
latest commit on `main` should massively speed up lossy multithreading on windows
|
|
2024-02-27 07:29:00
|
(we found a tiny little mistake in an old commit that slowed things down on windows by *a lot*)
|
|
|
spider-mario
|
2024-02-27 07:32:00
|
(old being ~October, if I recall correctly?)
|
|
|
jonnyawsom3
|
2024-02-28 02:49:16
|
Got any numbers?
|
|
|
Traneptora
|
2024-02-28 04:03:32
|
-October being the gcc optimization level of "ctober"
|
|
|
jonnyawsom3
|
|
veluca
latest commit on `main` should massively speed up lossy multithreading on windows
|
|
2024-02-28 02:20:17
|
```JPEG XL encoder v0.10.0 19bcd82 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 509.4 kB (0.491 bpp).
3840 x 2160, geomean: 1.199 MP/s [1.16, 1.23], 5 reps, 16 threads.```
```JPEG XL encoder v0.10.0 3d75236 [AVX2,SSE2]
Encoding [VarDCT, d1.000, effort: 7]
Compressed to 509.4 kB (0.491 bpp).
3840 x 2160, geomean: 8.977 MP/s [8.76, 9.08], 5 reps, 16 threads.```
7x improvement and from around 60% CPU usage to around 100%
|
|
|
|
veluca
|
2024-02-28 02:23:05
|
yep
|
|
2024-02-28 02:23:15
|
that's more or less what I'd expect
|
|
|
jonnyawsom3
|
2024-02-28 02:28:55
|
Did a random test on lossless since there was mention of trying to match speed a while ago
```JPEG XL encoder v0.10.0 3d75236 [AVX2,SSE2]
Encoding [Modular, lossless, effort: 5]
Compressed to 5508.2 kB (5.313 bpp).
3840 x 2160, geomean: 8.911 MP/s [8.75, 8.93], 5 reps, 16 threads.```
Apparently in this instance `-e 5` lossless almost perfectly matches lossy, at around 10x the filesize
|
|