JPEG XL

2025-11-22 05:46:59	I just gave up and fed it an image sequence... though that seems to be having issues too
2025-11-22 05:47:09	I have 3 images, and it's only adding two of them
2025-11-22 05:48:34	screw it, I'll do it in avisynth

AccessViolation_

2025-11-22 07:17:45

could that be because the apng itself is in infinite loop mode

Orum

2025-11-22 07:18:03	even if it is that's really dumb
2025-11-22 07:19:02	I wonder if imagemagick can make an animated <:JXL:805850130203934781>...

pshufb

2025-11-22 08:50:27

No matches for this in Discord search: https://www.mdpi.com/2079-9292/14/20/4116 Was a PR ever made?

AccessViolation_

2025-11-22 08:57:12	it seems like this would involve modifying the format itself. the format has been frozen so changes like these can't really be made at this stage
2025-11-22 09:02:56	maybe there's some novel techniques mentioned that can still be applied to the encoder though, I'm giving it a read

jonnyawsom3

2025-11-22 09:12:09

Yeah, changing a predictor would break compatibility with all images, and it's only for a 3% gain. We can do that with higher settings/smarter heuristics

AccessViolation_

2025-11-22 09:45:22	> It is worth mentioning that the novelty of this work lies not in inventing new predictors or GA mechanisms, but instead in the systematic integration of additional predictors and the adaptive GA-based weight optimization within JPEG XL’s predictive framework. This integration demonstrates performance gains while maintaining full codec compatibility. huh?
2025-11-22 09:55:18	> Central to our research is JPEG XL’s weighted average predictor, also known as the > error-correcting predictor you mean the weighted or self-correcting predictor?
2025-11-22 09:57:06	> JPEG XL, standardized in 2023 you mean 2021/2022?
2025-11-22 10:03:57	I mean it's whatever they want to demonstrate their thing instead of writing about the history
2025-11-22 10:09:05	this paper:
2025-11-22 10:09:08	jon's The Paper:

gljames24

2025-11-22 10:20:40

Chromium just reassigned it!

AccessViolation_

2025-11-22 10:20:56

wait, are they conflating JPEG XL and libjxl?

gljames24

2025-11-22 10:21:13
2025-11-22 10:21:43

AccessViolation_

2025-11-22 10:22:40	we had a little celebration here when it was announced :) https://discord.com/channels/794206087879852103/803574970180829194/1441519485583360020
2025-11-22 10:25:06	ohhh, "built into the standard implementation" meant "built into libjxl", not "specified by the the standard"...
2025-11-22 10:35:43	no it doesn't!
2025-11-22 10:35:49	this paper is such a rollercoaster
2025-11-22 10:35:56	highly recommend

jonnyawsom3

2025-11-22 10:51:07

I think... I understand. They're adding 'artificial' predictors in the MA tree, using the Weighted error for decisions?

AccessViolation_

2025-11-22 11:32:16	lol I'm afraid not
2025-11-22 11:49:26	notable, this was in the mode where they only used the weighted predictor. improvements are much smaller when the MA tree is allowed to decide when to use which predictor
2025-11-22 11:52:55	you could argue they're replacing the expressiveness of the MA tree with complexity in the weighted predictor in that case
2025-11-23 12:11:08	snarky commentary aside, the experiments themselves are pretty neat. now we know the effects that more sub-predictors in the weighted predictor would have it's also not like I've contributed anything close to this to the community so who am I to criticize anyone

Mike

2025-11-23 12:12:57

*tch* welp, I blame you <@384009621519597581> for I am now here looking for JXL image generation references 😛

AccessViolation_

2025-11-23 12:14:42

there's lots to learn!

Mike

2025-11-23 12:19:33

I stumbled across a complaint that JXL did a bad job losslessly compressing an old pixel art game maps, and I guess I'm here to see if I can encode harsh aliased blocky stuff, then tiles and maps

AccessViolation_

2025-11-23 12:21:42	oh that sounds fun
2025-11-23 12:23:32	there are some people here trying to optimize cases where the JXL lossless encoder does worse than certain other formats
2025-11-23 12:26:09	now I'm curious what that pixel art looks like 👀

Mike

2025-11-23 12:26:39	https://github.com/libjxl/libjxl/issues/4150#issuecomment-2735394561
2025-11-23 12:28:00	In essence, 4 bit color, lots of blatantly repeating patterns

AccessViolation_

2025-11-23 12:29:27
2025-11-23 12:29:28	cases like these unfortunately are not detected and replaced by patches currently

Mike

2025-11-23 12:29:30

So I'm looking at that thinking "why not just make the JXL store it similarly to how the images might have been stored/rendered in video memory on an old gaming device, i.e. as tiles and a map"

AccessViolation_

2025-11-23 12:30:45	certainly possible. I had a similar idea of writing a screenshot feature in an NES emulator that would extract the sprites from memory and encode them directly instead of encoding from the final pixel buffer
2025-11-23 12:32:11	the ability for libjxl to deduplicate sprites/tiles is currently limited to high-contrast features on low-contrast backgrounds. basically, it's only tuned for deduplicating letters in text

Mike

2025-11-23 12:32:34	Yeah, if you were feeling really fancy you could probably fully reproduce the idea of layers and sprites, but I'm thinking you could get far with just a general purpose tiler where the tiles with sprites on them just become a few unique tiles.
2025-11-23 12:34:34	But yeah, I'm thinking there may be something to the "JXL-art" approach, a tree (?) generator where unique tiles tiles become tree branches
2025-11-23 12:35:41	In theory, they should be able to shrink huge worldmap images
2025-11-23 12:36:19	Stuff like this: https://dragon-quest.org/w/index.php?title=Dragon_Quest_I_Maps&mobileaction=toggle_view_mobile#/media/File%3AOverworld_DQ_NES.png
2025-11-23 12:37:24	... Pretend that links to the raw map file, not a 1024x1024 resized version

AccessViolation_

2025-11-23 12:37:41	I have no doubt these could become delightfully tiny
2025-11-23 12:38:09	I read your github comment and there are some limitations. for example you're not allowed to transform sprites. just put them at a certain position

Mike

2025-11-23 12:38:43	yeah, I'm not expecting the full reproduction of old gaming hardware
2025-11-23 12:39:55	In practice, video memory had to fit things like numbers and fonts too, so storing a few flipped tiles should still keep usage low. I'm just mentioning that something they did support.
2025-11-23 12:40:46	I mean there may also be a way to abuse the idea of interpolation between colors to support flipping.
2025-11-23 12:41:22	assuming each tile is its own unique set of "ifs"
2025-11-23 12:42:03	But I'm getting ahead of myself.

AccessViolation_

2025-11-23 12:45:27	if a mirrored sprite is only slightly different from the normal sprite (i.e. the sprite itself is almost symmetrical) it might also be cheap enough to encode the sprite in subtract mode rather than replace mode. that means a bit of data to 'correct' the pixels of the sprite that don't match will also be stored
2025-11-23 12:47:23	hmm how do you mean?

Mike

2025-11-23 12:47:47	In practice flipping was used more often for sprites anyway, not map tiles. But it was something you could use to save video memory if say you were drawing a border around some text.
2025-11-23 12:52:34	If "tile 0" can be described as 01/23 in image data (imagine everything after / is below), and the actual pixels were emitted by interpolating from 0->1, then you just need to reverse them to interpolate backwards: 10/32
2025-11-23 12:53:02	that being a pseudo flip about the X axis
2025-11-23 12:53:42	``` 01 10 23 32```
2025-11-23 12:58:24	I'm assuming that if I wanted to implement tiling, that pattern `01/23` would emit something like this ``` .. .. XX XX .. .. .. XX .. .. XX .. XX XX .. .. XX XX XX XX XX XX XX XX XX XX .. .. XX XX XX XX .. .. XX XX ```
2025-11-23 12:59:44	It'd take up less memory if this was simply `0` and not `01/23`, but if your interpolation resulting in an image like the above, you can flip it by interpolating between the different corners
2025-11-23 01:02:02	My apologies if that doesn't make sense. I'm admittedly running of a very limited understanding of how the JXL-Art stuff works. That's why I'm here, to find more info
2025-11-23 01:05:29	What I'm essentially describing is a texture atlas
2025-11-23 01:06:56	In 3D graphics you describe what image goes on a triangle or quad by specifying its coordinates (UV) inside the original texture.

ignaloidas

2025-11-23 01:07:08

there's patches - where you just copy a portion of one frame onto another with some kind of blending - replace, add, multiply, with/without alpha - which is probably the best way to compress repeating sprites, and there's MA trees, which control the predictor parameters depending on position, values of neighboring pixels, values of predictors, previous values and a bit more

Mike

2025-11-23 01:07:13

the 01/23 is effectively the UV coordinates

ignaloidas

2025-11-23 01:07:28	MA trees are used for the base image
2025-11-23 01:07:35	patches are applied on top of it
2025-11-23 01:08:32	You can't really do any UV mapping type of shenanigans, because you can't combine them
2025-11-23 01:09:24	for what is possible with MA trees this is a good resource https://jxl-art.surma.technology/wtf

AccessViolation_

2025-11-23 01:09:57

this is what patches effectively do

Mike

2025-11-23 01:11:40	yeah I've only looked at the tree/jxl-art stuff thus far. Patches sound like the better solution
2025-11-23 01:12:58	I was considering some intense tree abuse, using the predictors to emit my tiles
2025-11-23 01:13:36	In practice I would have wanted to RLE compres the tile data, then turn that in to tree branchings
2025-11-23 01:14:11	to save me from using more if's than needed

AccessViolation_

2025-11-23 01:15:35

that's what the encoder already tries to do 🙂 it'll create the best tree for a given image, and encode the difference between the image as seen by the tree and the original, to always get back to the original in exceptional circumstances, the encoder can generate a tree that correctly represents basically *all* of the image without any need for encoding corrections (called residuals). this does happen, like for images of certain cascading cellular automata

ignaloidas

2025-11-23 01:17:02

Because trees are, in fact, trees, it's impossible to make them go into a specific branch more than once if you're looking at location and not other decision types, so I'm not sure how you could turn turn tile data into tree branching

AccessViolation_

2025-11-23 01:20:44

the tree is executed once per pixel (...per channel)

Mike

2025-11-23 01:22:25

I'm making some big assumptions that this worked similar to writing fragment and vertex shaders. Again why I'm seeking more info. I'd assumed that I could take an input, effectively a map in pixel form (i.e. 0's are water tiles, 1's are grass tiles, etc), scale that map data up which forces interpolation, and the tree decides what pixels to emit when. In other words, the tree acts like the fragment shader.

ignaloidas

2025-11-23 01:23:23

not possible within JXL, no

Mike

2025-11-23 01:24:00

hence why I'm looking for more info 😉

AccessViolation_

2025-11-23 01:25:08	there's a nice paper written by \_wb_ that explains the concepts of JPEG XL really well, might be helpful
2025-11-23 01:25:20	https://arxiv.org/pdf/2506.05987

Mike

2025-11-23 01:25:36

<@384009621519597581> shared some images over on the Mozilla groups that got me thinking JXL was effectively programmable, and that's the puzzle I'm trying to assemble

ignaloidas

2025-11-23 01:25:45

I was thinking on how if you had a tree branch that somehow encoded some patch somehow, you couldn't enter it afterwards if you entered by conditions on location - though now I have a very dumb idea on how you still could

AccessViolation_

2025-11-23 01:26:27

you're on the right track with the fragment shader idea

ignaloidas

2025-11-23 01:26:35

it is somewhat programmable, but it's limited to per-pixel programs with very little input (essentially the underlying "real" pixels)

AccessViolation_

2025-11-23 01:27:17

it really is a basically program that's run for every pixel, that decides which entropy coding context it belongs to and which predictor to use for it. but that's about the extent of what it lets you do

Mike

2025-11-23 01:27:50

yeah, which sounds a lot like fragment shaders to me. 😉

AccessViolation_

2025-11-23 01:27:55

exactly!

ignaloidas

2025-11-23 01:31:06	anyways, I think you could achieve essentially patch-like functionality using palette transforms to add extra colors - essentially have palette colors that are "outside" the normal range, which trigger specific nodes in the trees in a chain
2025-11-23 01:31:46	since palette can have duplicate colors it should work, no?

Mike

2025-11-23 01:38:33

lol, I hope this desire for more programmability doesn't lead to a need for a yet another file format post JPEG XL. 🤣

AccessViolation_

2025-11-23 01:39:21	these are examples of the properties you can test for in the yes/no questions in the MA tree example using `W` would be `W < 5`, or "does the pixel to the west of this pixel have a value less than 5"
2025-11-23 01:40:21	maybe we should just send people shader code, GPUs are fast enough anyway <:Stonks:806137886726553651>

Mike

2025-11-23 01:40:32	😉
2025-11-23 01:41:35	Hypothetically if images were just shader programs + embedded data, you wouldn't even need to store uncompressed images in memory. 😉
2025-11-23 01:43:12	SVG's are cool, but what if... executable. 😉

AccessViolation_

2025-11-23 01:43:44	SVGs are executable if you add JS
2025-11-23 01:44:11	I've seen people build entire web experiences that were just a single SVG which is something

Mike

2025-11-23 01:44:45	Sure, but I mean you skip the vector rasterization stage and just let the embedded program draw the thing. ;))
2025-11-23 01:45:41	That's kinda what I mistook the JXL-art stuff for at first.

AccessViolation_

2025-11-23 01:46:41	JPEG XL is a great image format and an ok programming language
2025-11-23 01:49:28	well I'm glad my propaganda reached you well in the mozilla matrix
2025-11-23 01:49:34	lmao
2025-11-23 01:52:13	I decided to check that chat out again for no particular reason and saw someone comparing JXL to WebP which triggered an unskippable cutscene that hundreds of innocent people were forced to sit through

Mike

2025-11-23 01:52:37

lol, I know that feeling. 😉

Traneptora

2025-11-23 05:13:49	JXL is a superset of cellular automata
2025-11-23 05:14:05	which themselves can be used to program turing machines
2025-11-23 05:14:28	the usual finite hardware issue blah blah but yes

Jyrki Alakuijala

2025-11-23 09:00:25

We wrote it together. 🌞

AccessViolation_

2025-11-23 11:06:15	oh! and I see luca and zoltan contributed too in equal amounts
2025-11-23 11:06:19	my bad

Exorcist

2025-11-24 12:54:23

Yet another Accidentally Turing-Complete? > JBIG2 Image Compression > Images can be Turing-complete as well. In this case, it was even exploited in practice in CVE-2021-30860... https://beza1e1.tuxen.de/articles/accidentally_turing_complete.html

Traneptora

2025-11-24 12:55:21	well it's "turing-complete" in the sense that it doesn't let you do arbitrary things, it lets you create arbitrary designs if you extend the height and width infinitely
2025-11-24 12:55:30	it's not a programmable image format
2025-11-24 12:55:39	it's just that if you consider pixel patterns to be computation

jonnyawsom3

2025-11-24 07:16:21

It was no accident, and only applies to whatever you fit within 1MP with limited decision nodes

_wb_

2025-11-24 12:16:19

jxl is by design not turing complete, i.e. there are no loop constructs so image decoding always terminates. MA trees are a pretty generic mechanism but it's not a programming language.

Traneptora

2025-11-24 01:18:40

if you extended the canvas infinitely it'd be able to simulate turing machines, but MA trees always terminate with finite canvas

_wb_

2025-11-24 01:22:06

yes, with infinite images (or rather, infinite group sizes) it would be Turing complete

spider-mario

2025-11-24 01:52:41

unlike Python 3 (Zed Shaw reference)

AccessViolation_

2025-11-24 03:10:49

is there no size limit on MA trees as well?

jonnyawsom3

2025-11-24 03:13:15

``` Maximum max_tree_depth (H.4.2) Level 5: 64 Level 10: 2048 Maximum global MA tree nodes (G.1.3, H.4.2) min(1 << 22, 1024 + fwidth * fheight * nb_channels / 16) Maximum local MA tree nodes (G.2.3, G.4.2, H.4.2) min(1 << 20, 1024 + nb_local_samples)```

AccessViolation_

2025-11-24 03:16:00

ah I see

_wb_

2025-11-24 03:24:50

shouldn't really be a limitation, more than one context per sample doesn't really make sense anyway

AccessViolation_

2025-11-24 03:42:14	oh yeah for sure. my question was asked in the context of JXL being Turing complete with infinite group sizes. I didn't know the MA tree size limit was proportional to the group size
2025-11-24 05:08:24	theoretical question: can VarDCT be lossless for 8 bpc input if you set all quantization tables to 1 and don't use transforms or filters other than DCT?

_wb_

2025-11-24 05:09:43	quant tables in JXL are not integers but floats, so I suppose this should be possible, yes
2025-11-24 05:09:58	in JPEG, quant tables are integers and quantization is w.r.t. 12-bit DCT coeffs
2025-11-24 05:10:22	in JPEG, all-1 quant tables are not precise enough for fully mathematically lossless, though the error will be pretty small

AccessViolation_

2025-11-24 05:11:59	that was going to be my second question. I read stackoverflow answers claiming a maximum quality JPEG is not lossless, even if you set all quant tables to 1 and disable chroma subsampling, due to rounding errors, but how are there rounding errors if you're multiplying by one
2025-11-24 05:12:34	oh, are coefficients floats and does it do integer multiplication against the quant table, maybe?
2025-11-24 05:16:17	could you elaborate? why does the precision matter if it's just a value of 1

_wb_

2025-11-24 05:17:40	the error is introduced by quantizing DCT coefficients to 12-bit ints before doing the "actual" quantization according to the quant table
2025-11-24 05:18:36	in jxl, DCT coefficients are kept as float32 (in libjxl) / real numbers (spec) and there is only one quantization step to convert the coeffs to ints.
2025-11-24 05:20:55	in jpeg, it's a two-step quantization: decode side, quantized coeffs are first multiplied by the factor in the quant table, then the resulting integer is interpreted as something like (x - 2048) / 4096 (it's a signed 12-bit number)
2025-11-24 05:22:14	12-bit dct coefficients are pretty precise but not quite precise enough to do a lossless DCT-IDCT roundtrip
2025-11-24 05:23:10	in jxl, you could in theory specify a quant table that will result in 32-bit int "quantized" coefficients
2025-11-24 05:23:34	(though for Level 5 you'll have to stick to 16-bit quantized coefficients)
2025-11-24 05:24:08	I haven't tested if 16-bit is enough to make it roundtrip losslessly but I would assume it is

jonnyawsom3

2025-11-24 05:24:19

In that example, the YCbCr conversion is also to blame. We squeezed some extra quality out of jpegli by adaptively switching to RGB JPEG at q100 It has a large filesize and compatibility penalty, but if you're using q100 JPEG you're probably doing something wrong anyway, so it works as a warning

_wb_

2025-11-24 05:25:25

yes, I'm assuming doing no color transforms at all, XYB will be even worse than YCbCr if you want to preserve RGB losslessly, since unlike YCbCr it's not linear

lonjil

2025-11-24 05:27:46

I've seen many RGB JPEGs from artists doing high quality lossy exports from drawing programs.

AccessViolation_

2025-11-24 05:39:54	that's interesting, in JXL it would reversible in practice though since 32-bit float is enough precision, I assume?
2025-11-24 05:40:36	I assume you mean XYB would be worse for that than YCbCr in JPEG 1

_wb_

2025-11-24 05:42:48	Yes, YCbCr costs you about 1 bit of precision while XYB would for some colors be more precise but for others cost 2-3 bits of precision, so if you just look at peak error in RGB, it would be worse than YCbCr
2025-11-24 05:43:42	Anyway, using VarDCT to do lossless encoding would not be efficient, you'll be much better off using Modular mode for that

VcSaJen

2025-11-24 05:44:25

Aside from `cjxl`, which software supports lossless jpg→jxl conversion? Both CLI and GUI applications. There's at least one, IIRC.

AccessViolation_

2025-11-24 06:34:51

I know there's some file archiver (rar?) that has the option to do this on JPEGs it sees, but presumably you're talking about other image formats

VcSaJen

2025-11-24 06:35:39

I mean lossless conversion from JPEG1 file to JPEG XL file

AccessViolation_

2025-11-24 06:36:11	dyslexia moment
2025-11-24 06:37:52	or it would have been before you edited that :) don't know any in that case

HCrikki

2025-11-24 06:45:22

this one was handy before and updated recently. its still cjxl underneath but you can use whatever mix of stable and nightly you like https://github.com/kampidh/jxl-batch-converter

Exorcist

2025-11-24 11:33:57

how cosine function can be no rounding error?

AccessViolation_

2025-11-24 11:37:49

the values after the cosine functions get divided by some value and rounded, but if you divide them by 1 they don't change and don't need to be rounded. the cosine function itself is reversible. so then no data is lost during that process

Exorcist

2025-11-24 11:39:26

> cosine function itself is reversible cosine function is from real number to real number, how computer store real number?

AccessViolation_

2025-11-24 11:40:21	floating point
2025-11-24 11:44:53	I think I understand what you're getting at. I said the input was 8 bit per channel
2025-11-24 11:48:56	f32 has enough precision to be lossless with respect to the original 8 bits in a DCT -> IDCT operation

Exorcist

2025-11-24 11:51:33

so you want store f32 in output file?😂

AccessViolation_

2025-11-25 12:01:46

that's what VarDCT JXL files are

jonnyawsom3

2025-11-25 12:29:09

By design and regardless of input

_wb_

2025-11-25 12:12:08

What is stored is ints, but quant tables are floats, and the ints are not limited to 12-bit like in jpeg. Essentially in jpeg, the finest quantization you can get is 1/4096 in the dct domain, while in jxl the finest quantization is effectively not limited.

username

2025-11-25 06:19:32

I randomly came across a bugzilla bug report for Firefox from a year ago where the reporter decided to upload thier screenshots as JXL files: https://bugzilla.mozilla.org/show_bug.cgi?id=1905611

Tirr

2025-11-25 06:59:20

oh we have jxl-rs v0.1.2 now

veluca

2025-11-25 07:03:42

yup, was helpful for the Chrome integration

monad

2025-11-25 07:04:41

before libjxl 0.12 <:Stonks:806137886726553651>

Orum

2025-11-25 07:08:26

I'm convinced libjxl 0.12 will never arrive <:FeelsSadMan:808221433243107338>

juliobbv

2025-11-25 07:10:05

we need libjxl-psy that can have community improvements that can later be mainlined <:galaxybrain:821831336372338729>

AccessViolation_

2025-11-25 07:10:42	feel free to use https://github.com/AccessViolation95/libjxl-tuning I suppose
2025-11-25 07:11:05	wait actually that's not a bad idea
2025-11-25 07:11:22	a fork where people can maintain tuning profiles as config files (this fork was created to allow tuning via config files so you don't have to rebuild after every change)

juliobbv

2025-11-25 07:11:45

but seriously, having a repo that serves as a "staging area" does help in the long term

AccessViolation_

2025-11-25 07:15:19

I wanted to set up a pipeline that tests proposed tuning improvement against a large corpus of images and quality metrics, but was told metrics tend to prefer over-smoothing which is what people tuning libjxl are currently trying to solve

juliobbv

2025-11-25 07:18:03	you could add different tuning modes
2025-11-25 07:18:43	libaom has tune iq (read: a combination of modern image metrics and subjective checks) and one specifically for ssimu2

AccessViolation_

2025-11-25 07:19:45	does libaom do some sort of automated testing or generating of parameters by testing different values against a corpus?
2025-11-25 07:20:19	or, av1 encoders in general, not specifically libaom

monad

2025-11-25 07:21:04

lossless would be more straightforward to measure for if you have any interest

juliobbv

2025-11-25 07:21:13	svt-av1 just landed a testing framework
2025-11-25 07:21:29	and it's modular enough to include new metrics in the future

AccessViolation_

2025-11-25 07:22:10

and the tuning itself, that's all manual? (I'm not talking about the thing where it uses a metric to see where it needs to allocate more quality or similar)

juliobbv

2025-11-25 07:22:24

like, the process of tuning the encoder?

AccessViolation_

2025-11-25 07:22:26

yeah

juliobbv

2025-11-25 07:22:27	yeah, it's mostly manual
2025-11-25 07:22:56	encoders tend to use heuristics to approximate rate-distortion choices
2025-11-25 07:23:25	libaom does technically use butter as an RDO metric for its namesake tune but it's not good
2025-11-25 07:24:02	that's why subjective visual checking comes in handy -- to keep metric deficiencies at bay

AccessViolation_

2025-11-25 07:27:03	yeah jonny has been doing a lot of that ^^"
2025-11-25 07:28:15	I think someone is already actively testing releases for regressions in effort 11 (yes, 11) lossless
2025-11-25 07:29:24	or they were at one point. I don't know if they still do

monad

2025-11-25 07:30:04	that's quite different than searching the parameter space for some optimal state
2025-11-25 07:32:37	I have a feeling there are big improvements for e10, but this is unproven at acceptable decode speed

AccessViolation_

2025-11-25 07:32:38

my bad, I thought that was in response to the automated testing of manual tuning

monad

2025-11-25 07:34:39

Oh, so the "proposed tuning" is not automatic. Then I misunderstood you.

veluca

2025-11-25 07:36:09

I wonder if using a diffusion-like approach to figure out how much "out of distribution" are compressed images compared to natural ones would work for tuning encoders

AccessViolation_

2025-11-25 07:37:55

right, that would be for testing manual tuning, but I was talking about automated tuning in the message after that regardless, that would be nice, though i'm not sure how it would work. I feel like you'd almost need some sort of machine learning in the case of lossy because coding tools can't really be evaluated in isolation. for lossless it might be simper because there's no fidelity you also need to keep track of

2025-11-25 07:39:47

to be clear, that's machine learning for figuring out the tuning parameters, not used *in* the encoder every time you encode an image

monad

2025-11-25 07:40:03	yes, well if you are just supplementing manual tuning, then I see no problem with checking against objective metrics to get more coverage
2025-11-25 07:42:18	the measurements may provide signals for certain images to look at

AccessViolation_

2025-11-25 07:45:25	I wonder if things would be easier if we had metrics that reported scores for specific types of distortion rather than a single score. so smoothing, ringing, color changes, blocking, are all reported individually
2025-11-25 07:45:46	not implying that this would be a walk in the park to create or anything

juliobbv

2025-11-25 07:47:13

yeah, that'd be worth of having a centralized repo that can serve as a reference for the community to test and give feedback to

jonnyawsom3

2025-11-25 08:33:57

There's a pretty hard stance to avoid tunes if we can, preferring heuristics to do the decision making. Just detecting photo vs non-photo would already go a long way, and is partially implemented, but needs a lot of work

AccessViolation_

2025-11-25 08:39:39	there's a difference? are those values I moved to a config for tuning or heuristics
2025-11-25 08:40:44	I thought they basically referred to the same thing

jonnyawsom3

2025-11-25 08:43:56

We've just been making PRs and Draft PRs using my fork, since I have auditing perms on the repo and can run the GitHub Actions to get test results and binaries with every update. It's how we made Faster Decoding 80% smaller and 25% faster

username

2025-11-25 08:44:15

heuristics are kinda like sets of behaviors that make decisions while tuning more so refers to changing around or tweaking values. that's my perception of the difference anyways

jonnyawsom3

2025-11-25 08:45:06

I was working on tuning for the DCTs, trading smoothing for more HF detail at the cost of noise. Heuristics are more like deciding what coding tools to use in the first place, if VarDCT or Modular is best, if patches are useful, ect

monad

2025-11-25 09:09:48

what kind of photo vs non-photo distinction do you need? you mean like content segmenting, or classifying images with particular characteristics?

juliobbv

2025-11-25 09:17:06	I mean, just for the fork
2025-11-25 09:17:29	so you can (ab)use different tunes to mean different tradeoffs for easier testing
2025-11-25 09:18:31	instead of having to provide two plus distinct binaries

paperboyo

2025-11-25 10:41:27

DZgas Ж

2025-11-27 03:24:54

Over the past month, I've been working a bit on possible encoding improvements.. I tested three possible approaches: 1. Replacing the difference algorithm with butteraugli, which provides a small quality up at the cost of enormous computational complexity. 2. Replacing vardct with lossy modular and the Dynamic Programming algorithm, which creates an effect similar to GIF lossy compression. Overall, there's something to this; another approach and this also works. 3. Replacing the inter-frame predict algorithm (block replacement when changes color or pixels) with a full motion search algorithm using vectors, a full analysis of all pixel motion vectors, and block replacement when motion is detected. Overall, this seems like the best approach due to its independence from actual colors, etc., but it has its own specific error accumulation artifacts due to the fact that prediction must be performed based on the frame that actually exists at the given moment, taking into account all overlaps after it, the master frame, and the next frame that should exist. Conclusion: not a single algorithm gave any multiple improvement, for jpeg XL the maximum saving limit for all similar algorithms is about 10-30% of the original size, no 4K videos at all.

2025-11-27 03:30:08

Because of all this, my original 1.0 algorithm remains the best. Unfortunately, I wrote it too well ant fast. It's better suited for compressing JPEG XL by 10-20% for animations. In theory, if the entire frame is completely static, a gain of around 90%+ could be achieved under specific conditions. However, I've experement extremely high compression, and it fails all.

AccessViolation_

2025-11-27 09:44:43	I got my hands on a really crusty and blocky JPEG with notably really obvious blocking in the sky gradient, which made me think: could we do some sort of hybrid approach to JPEG reencoding that keeps most properties intact to avoid reencoding pixels directly, just like lossless JPEG recompression, but additionally applies EPF and LF smoothing? I don't think Gaborish would work since that'd require sharpening the image before encoding, maybe EPF has a similar requirement, but adaptive LF smoothing on cases like these could be promising if it's possible
2025-11-27 09:54:35	I got this idea when I reencoded the original JPEG from pixel data at distance 10 and it looked better thanks to LF smoothing the sky
2025-11-27 09:57:22	I'm going to see if I can hack this into libjxl

veluca

2025-11-27 10:30:24

could just do jpeg transcoding and force enable lf smoothing

AccessViolation_

2025-11-27 10:43:17

ye, if we don't mess with it aside from signaling LF smoothing and it's still possible to losslessly decode to the original, it might be an idea to add an `--improve_fidelity` (or maybe the term "appeal" is more accurate here) parameter during encode that just makes it looks nicer as a JXL if it's heavily compressed

veluca

2025-11-27 11:03:56

would also be interesting to see if one could train a diffusion(-like) model to restore JPEGs within the constraints of their quantized coefficients

RaveSteel

2025-11-27 11:09:21

Distance 1 is the default value for most input files with cjxl Simply specify -d 0 for lossless encoding

ignaloidas

2025-11-27 11:10:09

it defaults to lossless on JPEG/GIF inputs and lossy on everything else

AccessViolation_

2025-11-27 12:14:59

what's LF smoothing called in the source code? I can't find a reference to it anywhere edit: found it

jonnyawsom3

2025-11-27 12:21:08	You could just make it `--Adaptive_LF` to match others like EPF and CFL. Default is follow the spec, but you can force it to 1 for JPEGs or disable it for JXLs
2025-11-27 12:21:26	Though, I guess this is decode side only

AccessViolation_

2025-11-27 01:15:08	will this be an issue
2025-11-27 01:20:14	I haven't figured out how to enable LF smoothing for JPEG recompression yet, but if it's by definition not compatible with chroma subsampled JPEGs then I can stop trying, as that's basically all JPEGs
2025-11-27 01:20:42	unless it's a libjxl limitation rather than a spec limitation, in which case I would want to try to see what happens and potentially try to resolve it

veluca

2025-11-27 01:25:18	ah yes
2025-11-27 01:25:23	it's a spec limitation

AccessViolation_

2025-11-27 01:26:39	ah, bummer
2025-11-27 01:43:24	I think I'm going to explore it further though. it won't work with most JPEGs when doing lossless recompression, but you could still encode the Y channel as-is and upsample the Cr and Cb channels while implementing native subsampling (zeroing and reordering certain coefficients)
2025-11-27 01:49:16	but that's gonna require a lot more familiarity with the codebase so maybe I'll save that idea for later

Traneptora

2025-11-27 01:53:37	<@184373105588699137> got some time this weekend. any ideas for libjxl options you wish to expose tp ffmpeg?
2025-11-27 01:58:45	native upsampling happens after the color transforms, unfortunately

AccessViolation_

2025-11-27 01:59:22

I meant upsampling it before you even give it to those steps of the encoder

Traneptora

2025-11-27 02:00:07

oh like, upsampling chroma into a 444 jpeg then lossless converting

AccessViolation_

2025-11-27 02:00:08	so that what the encoder sees is a 4:4:4 JPEG
2025-11-27 02:00:48	though possibly lossily on the color channels since the fact that they get upsampled like this will probably inflate it

Traneptora

2025-11-27 02:01:20	The issue here is if you upsample by adding trailing 0 dct coeffs you are doing nearest neighbor
2025-11-27 02:01:24	iirc

AccessViolation_

2025-11-27 02:02:31

isn't 4:2:0 also effectively nearest neighbor too though?

Traneptora

2025-11-27 02:02:32	wait no
2025-11-27 02:02:38	420 is linear

AccessViolation_

2025-11-27 02:03:34

hmm

Traneptora

2025-11-27 02:11:35
2025-11-27 02:11:39	here's an example with mathematica
2025-11-27 02:13:01	there may be some kind of sqrt(2) factor here and there missing
2025-11-27 02:13:18	but that's the gist of it
2025-11-27 02:14:03	point is that it's not a no-op
2025-11-27 02:14:09	unless the image is constant
2025-11-27 02:14:20	even if the image could be represented with fewer DCT coeffs
2025-11-27 02:14:38	as in this case of a linear increase

Exorcist

2025-11-27 02:17:39

Any benefit?

_wb_

2025-11-27 03:39:21	Applying EPF when recompressing low-quality JPEGs sounds like a reasonable thing to do, maybe we should do it automatically depending on the quant tables (where the amount of EPF to do is proportional to the overall amount of quantization).
2025-11-27 03:40:37	LF smoothing could still be beneficial, though most 4:4:4 JPEGs are high quality and don't need it that much...

jonnyawsom3

2025-11-27 04:12:30

I think djxl could definately do with some more options like cjxl. Allowing to force enable/disable different decoding options and override what the header says

AccessViolation_

2025-11-27 04:52:54	that sounds very promising, I like that. actually wasn't sure whether it would be feasible after I suggested it, because I thought it might require data of what the original pixels were in addition to the coefficients
2025-11-27 04:55:25	I think doing it by default could be good though lossless JPEG recompression has always implied that the visuals don't really change either, at least not to an extent like this
2025-11-27 04:56:07	and for example a web server receompressing JPEGs in real time wouldn't want to do that obviously

_wb_

2025-11-27 04:56:09	EPF is just a decoder-side filter that does some selective smoothing, it's exactly made for cleaning up the artifacts of too aggressive DCT quantization so it's just as good a fit for low-quality JPEGs as it is for low-quality JXLs The only thing is in JXL encoding from pixels, we can also modulate the EPF based on the local amount of quantization that is done and whatever other heuristics; in case of JPEG there is no variable quantization and we don't know what the original pixels were, so modulating the EPF strength locally makes less sense.
2025-11-27 04:57:06	Encode-side, you'd just signal that EPF needs to be done by the decoder at some constant strength. It doesn't change the encode complexity at all.
2025-11-27 04:57:47	Decode-side, it becomes a bit slower since EPF has to be applied, but that's not a huge deal
2025-11-27 04:58:56	(and when reconstructing a JPEG, it doesn't make a difference since then the decoding is stopped already before IDCT happens, which is before EPF can get applied)

AccessViolation_

2025-11-27 05:05:37

ooo

_wb_

2025-11-27 05:07:45

In this figure from a long time ago, this path with yellow-fading-to-green boxes is basically about doing JPEG recompression while also doing some enhancements like signaling EPF, adaptive LF, maybe adding some noise synthesis or even using splines or additional layers to retouch the original jpeg a bit. We never really did anything like that but this was something we've had in mind while designing JXL — having these options was one of the reasons why we got rid of Brunsli as a separate mode and integrated JPEG recompression into VarDCT instead.

AccessViolation_

2025-11-27 05:16:12	oh neat
2025-11-27 05:18:19	is there a reason LF smoothing can't be done on chroma subsampled frames by the way?
2025-11-27 05:19:49	I assumed it's to simplify the decoding process
2025-11-27 05:25:32	trying to find out myself currently but my very legally acquired copy of the spec doesn't have searchable text <:KekDog:805390049033191445>

Quackdoc

2025-11-27 05:27:32

progressive DC and fast decode are the big ones off the top of my head if they aren't in it yet.

Traneptora

2025-11-27 05:29:45

okie

_wb_

2025-11-27 05:31:38

IIRC: LF smoothing is defined by a criterion that involves all 3 components, but when you do chroma subsampling, those components don't have the same dimensions, also in the LF. It would get tricky to define it in a way that doesn't just assume that you have an LF image where every component has the same dimensions. Same with Chroma from Luma. These things were defined back when VarDCT didn't even have chroma subsampling as an option, and we never bothered to try to make these things work in the not-444 case since that would complicate things too much and it would not bring any benefit to normal jxl compression (which is always 444).

AccessViolation_

2025-11-27 05:33:12

I see I see

Traneptora

2025-11-27 05:33:36	It may have been a mistake to define B as it was and not as B-Y like it is in modular xyb
2025-11-27 05:33:55	because then the default cfl is built in

AccessViolation_

2025-11-27 05:54:03	it's a shame this code causes a failure instead of ignoring the instruction to do adaptive LF smoothing, because otherwise retroactively changing the spec to allow it would have no real negative consequences to outdated decoders. they'd just show the JPEG as it is :p
2025-11-27 05:57:14	conversely it's good it fails because that incentivizes other encoders to respect the spec

jonnyawsom3

2025-11-27 06:06:03

Adding to that, photon noise could be nice. Can't think of anything else useful in a common FFMPEG context

Traneptora

2025-11-27 06:06:57	we have a draft in <#1021189485960114198> that can be searched but it's just a draft
2025-11-27 06:09:25	As it stands decoding a recompressed Jpeg produces pixels that are within the jpeg spec tolerance
2025-11-27 06:09:45	would running epf change this?

_wb_

2025-11-27 06:11:49

yes, probably, if EPF is sufficiently strong

AccessViolation_

2025-11-27 06:12:29

> It would get tricky to define it in a way that doesn't just assume that you have an LF image where every component has the same dimensions couldn't you define it as taking the components as they are "in the end"? for example 4:2:0 subsampling is interpreter linearly (apparently), so have it do its own thing to expand to the whole image dimensions, and then do LF smoothing?

_wb_

2025-11-27 06:13:17

that doesn't really make sense, LF smoothing is done before IDCT and chroma upsampling is done after

AccessViolation_

2025-11-27 06:13:37

oh okay

_wb_

2025-11-27 06:14:27

it's not impossible to define something that would kind of work, but it would get kind of nasty from a spec/decoder complexity point of view

AccessViolation_

2025-11-27 06:17:25	it's worth exploring for non-lossless-jpeg use cases regardless. I feel like the encoder being aware what the JPEG was composed of instead of decoding it to pixels and going from there has potential to produce much better results
2025-11-27 06:20:13	oooh you could probably run the block merging logic on those 8x8's for example
2025-11-27 06:21:24	(after undoing chroma subsampling if necessary)
2025-11-27 06:26:01	a sort of "jpeg aware" mode if you will

jonnyawsom3

2025-11-27 06:37:10

Lossy JPEG transcoding has been an idea for quite a while. Quantizing the exisiting coefficents, ect

Demiurge

2025-11-28 12:19:41	I wonder if I can popularize .jjxl and .jxll as file extensions for lossless transcodes. I use them personally on my linux pc as a convention to help me personally keep track of which files were lossless transcodes. It's kind of nice and convenient being able to tell at a glance which files can be easily, reversibly transformed into JPEG or PNG.
2025-11-28 12:20:02	And it's better than .jpg.jxl for example
2025-11-28 12:26:51	Only downside is that, some software isn't smart enough to recognize files regardless of what name they have. Like probably windows might be worst at it.

RaveSteel

2025-11-28 12:44:19

Do you use this system simply to know or is there a usecase for it?

HCrikki

2025-11-28 12:44:55

doesnt this info already get embedded into the files themselves? iinm exif shows bitstream reconstruction data and image cataloging apps you use should be able to use this information for say filtering, search or appending conversion-related comments of your chosing (ie created by gimp). your workflow would need to keep updating that info across future edits and conversions to remain truthful though (a reversibly lossless jxl that is not read-only could still be edited and tagged intentionally or not)

Demiurge

2025-11-28 12:58:53

Yeah. If I want to send someone an image but I can't share a jxl, it's useful to know it's a lossless file ready to be easily converted back to JPEG or PNG

RaveSteel

2025-11-28 12:59:47

If you use KDE simply create a kio service menu to convert JXLs to JPEG via jpegli on the fly

Demiurge

2025-11-28 12:59:52

For me personally, I like that. But most average PC users don't even know what lossless means

RaveSteel

2025-11-28 12:59:53

If you want I can show you how

Demiurge

2025-11-28 01:01:40	I think it's nice being able to see at a glance "this isn't lossy, so I can convert this without losing anything"
2025-11-28 01:03:25	So a .jjxl can be perfectly restored to a jpeg a .jxll can be converted to any format without worrying about decompressing a lossy file.

Exorcist

2025-11-28 01:21:41	As discussed before, how to name multi frame/page/layer, and how to name some frame lossy & some frame lossless in one file...
2025-11-28 01:27:56	This is the only answer☝️

AccessViolation_

2025-11-28 01:28:49

I prefer .jpg.jxl, I don't really like JXL files with something other than .jxl at the end

Demiurge

2025-11-28 01:29:13

I don't know why multi frame is something that needs to be in the filename itself. It seems like less useful info that will be inferred another way. Maybe .jxls if you really want one. And if it's not easy to convert it back into a different format, then there is no reason to name it .jjxl or .jxll

Exorcist

2025-11-28 01:29:37

GIF

Demiurge

2025-11-28 01:30:43	.jxll then
2025-11-28 01:31:07	If it was losslessly converted from a gif
2025-11-28 01:31:25	Same logic as png->jxl
2025-11-28 01:32:25	That seems more useful than .jxls which is why I don't think there needs to be a separate filename for sequences. gif doesn't have a separate filename for stills

Exorcist

2025-11-28 01:33:07

Nice logic, how you know if `.jxll` losslessly converted from a gif or png?

Demiurge

2025-11-28 01:33:49	Why does that matter? png does everything gif does
2025-11-28 01:34:32	I would just convert it back to png if I named it jxll
2025-11-28 01:34:54	And I had to send it somewhere that doesn't take jxl

Exorcist

2025-11-28 01:35:38

What if someone don't accept APNG?

HCrikki

2025-11-28 01:37:31

sharing the name, a link or binary of a working jxl decoding app sounds like the simplest solution. a recipient not planning to edit the sent image only needs ot openable

Demiurge

2025-11-28 01:39:00	What matters to me personally is knowing that I'm not decompressing a lossy file and permanently losing the advantages of the compression
2025-11-28 01:40:31	Knowing that conversion is reversible is a convenient thing for me personally to know. But I know that most people don't even know what "lossy" is
2025-11-28 01:46:31	Still it would be nice if jxl files with different extensions would still show up in typical thumbnailer software.
2025-11-28 01:47:45	There are a few rare softwares that are only looking for specific filenames ending in .jxl only
2025-11-28 01:48:03	I think Windows Explorer is one of them
2025-11-28 01:48:28	Idk cuz I don't use windows very often

Exorcist

2025-11-28 01:56:43

> a convenient thing for me personally to know So do name personally, why need special file extension?

jonnyawsom3

2025-11-28 02:07:39

It's not. jxlinfo checks for XYB to tell if it's lossy, and reconstruction data to detect a JPEG

AccessViolation_

2025-11-28 05:10:29

simply use my cursed patent pending idea where images are wasm modules that contain both the encoded image data and code to decode it <:Stonks:806137886726553651>

HCrikki

2025-11-28 05:13:02	much better, encode jxls in some 'maximum backward compatibility' mode that is really a jpeg generated using jpegli then immediately transcoded to jxl - one a decoder could choose to decode as either a jpeg or jxl depending on wether an app supports jxl or or is just jxl-aware and lacking jxl decode capability
2025-11-28 05:13:39	current transcoding processes seem to require the creation of an intermediary file

VcSaJen

2025-11-28 05:19:19

Well, there are double extensions like .tar.gz .

AccessViolation_

2025-11-28 05:19:38

JXL does allow you to use whatever file extension you want as it isn't specified, so everyone is free to use conventions they like. here's my opinion on a special extension for losslessly recompressed JPEGs: lossless JPEG recompression is, in my eyes, for most use cases, not a feature to store JPEGs as a smaller intermediary format, only to decode them back to JPEG eventually. rather, it's to encode JPEGs we already have into JXL, and keep using them along with our other JXLs. in the future when JXL has replaced JPEG in most places, it's not going to matter that they can be decoded back to the original JPEGs. what matters is that they're now *no longer* JPEGs, but JXLs that you can *also* use everywhere -

2025-11-28 05:22:47

I don't see it as "some subset of JXL files are backwards compatible with JPEG after a lossless transformation [and we need an extension to tell those apart]" instead, I see it as "most JPEGs are forwards compatible with JXL after a lossless transformation", which, hopefully, will be a much more important property in the not too distant future, and as such we won't need an extension to tell them apart

Demiurge

2025-11-29 01:51:17

It would just be nice if windows recognized more than just ".jxl"

whatsurname

2025-11-29 02:34:20

Adding new MIME types would be a mess, and it takes years to propagate through the ecosystem It took 4 years for Android to support `image/jxl` https://issuetracker.google.com/182703810 See also the discussion about removing the `.avifs` file extension in AVIF https://github.com/AOMediaCodec/av1-avif/issues/59

Meow

2025-11-29 03:16:36

like basically nobody recognises .apng

A homosapien

2025-11-29 11:12:58

libjxl_anim muxing support? Since `ffmpeg -i input%04d.png/input.mp4 -c:v jpegxl_anim out.jxl` doesn't work.

lonjil

2025-11-29 05:31:42

does a tool to dump the tree from a jxl like, still exist?

AccessViolation_

2025-11-29 05:33:45	in a human readable graph, yeah
2025-11-29 05:35:58	it doesn't really work if your image is composed of more than one group though
2025-11-29 05:36:36	I can get you the code or a linux binary if you need it
2025-11-29 05:37:15	I'll probably create a branch that has it in my `libjxl-tuning` fork, with jon's permission since that code is his

lonjil

2025-11-29 06:03:50

gimme da code

jonnyawsom3

2025-11-29 06:53:05

https://discord.com/channels/794206087879852103/822105409312653333/1346452090465030204

spider-mario

2025-11-29 08:00:41

(the header at the start of the file gives you permission)

_wb_

2025-11-29 08:31:27

yes no need to ask permission, every contributor to libjxl already gave permission to do basically whatever you want with our code

spider-mario

2025-11-29 08:35:08	if you really want to ensure proper attribution, when you put that file in your fork, you can `git commit --author='Jon Sneyers <jon@cloudinary.com>'`
2025-11-29 08:35:38	then Jon will be the “author” of the commit (the one who authored its contents) and you will be the “committer” (the one who committed them to the repo)
2025-11-29 08:36:10	(`git log` only displays the author by default but they’re both there)
2025-11-29 08:36:14	(GitHub and `tig` show both)

AccessViolation_

2025-11-29 08:37:25	alright. I need to find out which of the cloned repos has it, gimme a minute
2025-11-29 08:43:36	<@167023260574154752>

lonjil

2025-11-29 08:44:19

danke

AccessViolation_

2025-11-29 08:47:18

also effort 11 with this is severely broken

lonjil

2025-11-29 08:49:48

oh?

AccessViolation_

2025-11-29 08:51:37

oh actually it might not be. I remember it glitching but that was because images were larger than a group, because it'd create those files for multiple groups simultaneously. effort 11 may or may not be broken, I seem to remember there was some oddity with it

jonnyawsom3

2025-11-29 08:57:22

You're probably thinking of the debug image outputs, the MA trees write independently

AccessViolation_

2025-11-29 08:59:50

ah. yeah, I was mostly interested in the predictor and context maps then

cioute

2025-11-30 11:12:52

What is jpeg ai?

Traneptora

2025-11-30 01:19:22

<@184373105588699137> <@238552565619359744> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21051

_wb_

2025-11-30 01:46:24

https://jpeg.org/jpegai/documentation.html

cioute

2025-11-30 05:32:26

Hmm... neural lossless codec has sense?

jonnyawsom3

2025-11-30 06:26:49

It's already been discussed for MA trees in lossless JXL, as it would only impact density, not accuracy

ignaloidas

2025-11-30 07:01:55

for MA trees as in a neural net acting in place of a MA tree or as just another predictor?

jonnyawsom3

2025-11-30 07:05:16

Neither, those wouldn't be valid bitstreams. Using a neural network to create optimal MA trees for a given image

ignaloidas

2025-11-30 07:31:03

I meant more as an possible extension, but yeah, could make sense for just quickly "guessing" a good MA tree for a given image

RaveSteel

2025-12-01 12:48:24

<@&807636211489177661>

Traneptora

2025-12-01 11:47:21	Does the 18181-2 spec require `jxlp` boxes to be consecutive?
2025-12-01 11:55:26	Because `cjxl` is taking a png and inserting a `brob` box (containing `Exif`) between two jxlp boxes

ignaloidas

2025-12-01 12:47:55

It explicitly suggests that other boxes may be placed in between jxlp boxes

Jyrki Alakuijala

2025-12-01 01:37:54	perhaps energywise good for interplanetary communications
2025-12-01 01:40:38	JPEG AI is a neural codec project that measures success in very low BPP images. I tried to guide them to focus on the same quality that people use nowadays with JPEG and higher. All the other 200 JPEG experts in that committee wanted to compress much more and make images look much much worse than what people are using today. I have little hope that they will come up with anything that has real value. I decided not to follow that project after my failure to correct the initial scoping and benchmarking of success.
2025-12-01 01:42:21	it is bit of the same story as with AVIF and JPEG XL -- AVIF measured success during development with low BPP coding, so they ended up with tools that help low BPP. I measured with mid to high BPP, and a tool that didn't improve mid and/or high BPP error, didn't make it into JPEG XL.
2025-12-01 01:43:16	I think it is likely good to think of JPEG AI as AVIF, but more hallucinations and much slower/more energy expensive to decode, and 20 % more compression (guessing)

AccessViolation_

2025-12-01 01:46:22	fascinating. we still have coding tools that do pretty good at low to medium BPP as well. EPF, Gaborish and adaptive LF smoothing help a lot
2025-12-01 01:47:26	I don't know if you saw, recently I (and veluca, who did a much better job) were able to beat the "very low quality preview" coding tool in WebP2, using combinations of JXL features that weren't even designed for it
2025-12-01 01:48:30	it's even more impressive knowing that these were all designed for med-high BPP

Jyrki Alakuijala

2025-12-01 01:49:00	Elena Alshina works for Huawei and leads this effort -- I would be surprised if Huawei didn't try to implement it, but I would also be surprised if it didn't come up as a failure. Too slow, too much heat/energy, too little improvement and too wild hallucinations.
2025-12-01 01:49:38	They also decided against SSIMULACRA2, Butteraugli and SSIM as metrics to follow progress in selection of tools.
2025-12-01 01:50:41	They used 5 other metrics which are more or less failures, but they were more pleasure for the ~200 scientists starting the JPEG AI effort
2025-12-01 01:52:47	https://arxiv.org/html/2503.16288v1 for example figure 15 (there are ~5 similar figures with different metrics) shows interesting things: they are tracing performance from 0.02 BPP upwards the performance against the selected metrics is essentially the same as with VVC
2025-12-01 01:58:10	If I had used 5x more multiplication budget for JPEG XL, I could have made it work much much more elegantly for lower quality, possibly a 50 % improvement there -- but I considered it is not worth it The neural codecs use 1000x more multiplication budget and get a 55 % improvement but with hallucinations -- it doesn't feel like the right thing to do to me

RaveSteel

2025-12-01 02:01:57	Sounds like using JPEG AI could be pretty fun depending on settings
2025-12-01 02:02:05	Just to see what it hallucinates

AccessViolation_

2025-12-01 02:03:25

> If I had used 5x more multiplication budget for JPEG XL, I could have made it work much much more elegantly for lower quality, possibly a 50 % improvement there -- but I considered it is not worth it I'm curious what you would have done. any specific coding tool? something novel?

Jyrki Alakuijala

2025-12-01 02:08:02	I thought about it by myself, but it had been published before with a name 'quantization constraint' (cannot find the article now easily). Basically it is a resonance of DCT-IDCT where smoothing and dequantization are applied, basically it relaxes towards a valid dequantization without block boundaries. Also, rotated DCTs would have been nice for lower quality.
2025-12-01 02:08:53	we implemented it during the JPEG XL development, but in the end I decided to remove it for having more predictable decoding speed
2025-12-01 02:10:10	the rotation part I implemented in a wrong way, I understood my mistake only long after JPEG XL was implemented -- but in any case it would have been a low BPP tool only

AccessViolation_

2025-12-01 02:11:30

heh rotated DCTs are something I thought about as well. inspired by reading about the directional gradient coding tool in AVIF which it uses at very low qualities

Jyrki Alakuijala

2025-12-01 02:15:36

I was planning to sample the pixels with a rotation, but I let one of our geniuses to convince that I can do the rotation on the dct coefficients alone -- which was incorrect, but I got confused

Exorcist

2025-12-01 02:16:20

model = predictor, but photon noise is fundamentally not predictable, so nonsense

Jyrki Alakuijala

2025-12-01 02:16:32	photon noise modeling in JPEG XL I am proud of
2025-12-01 02:16:41	wouldn't change a thing with it 🙂

Exorcist

2025-12-01 02:17:22

I mean the noise in source, not the synth

Jyrki Alakuijala

2025-12-01 02:17:24	yes, the physical photon noise spoils neural dream of predicting photographs
2025-12-01 02:17:31	yes, I understood 777 ms too late

Exorcist

2025-12-01 02:23:31

If you mean "visual lossless", try AOM-AV1 `tune=iq` first

AccessViolation_

2025-12-01 02:25:57

when I thought about it, my idea was to sample the orange area, reversibly shear-rotate (just displacing, not changing the pixel values themselves) the pixels to align them optimally, then encode the whole of it with DCT as normal. during decoding, the pixels would be shear-rotated back into place after IDCT really only the pixels in the blue circle are important, but I figured rectangles are probably easier

2025-12-01 02:26:59

(the green square is 8x8, not the whole thing)

Exorcist

2025-12-01 02:27:56

Today you can recognize AVIF is JPEG XL "in another world", because the influence timeline: JPEG XL → Butteraugli & SSIMULACRA2 → SVT-AV1-PSY → AOM-AV1 `tune=iq`

AccessViolation_

2025-12-01 02:30:52

you could maybe get around having to encode invisible pixels but I have no idea if DCT is easily applicable to odd shapes, or if you could trivially encode pixels that wouldn't be visible as "don't-cares" with DCT

Jyrki Alakuijala

2025-12-01 02:31:04	everything is linked: tune=iq uses XYB that I found during butteraugli/guetzli/jpeg xl -- a dirty secret of XYB is that it is only based on my own eyes
2025-12-01 02:32:41	yes, that it is basically -- just this kind of resampling is not computationally cheap to do properly, and a cheap resampling would be visible in the results
2025-12-01 04:26:12	[((((Butteraugli -> XYB), Guetzli, Brunsli) -> (Pik)), (WebP lossless) -> (Brotli), (FLIF) -> (FUIF) -> (WebP lossless/Brotli-like fixed entropy FUIF))]->JPEG XL

jonnyawsom3

2025-12-01 04:34:30

Speaking of XYB, do you think you could try making a new parameterized quant table for libjxl? The HF of the B channel needs a more gentle falloff. Currently it's over 4x less precision than X, requiring extremely high quality settings to preserve certain colors in high contrast areas (The desaturation we discussed before)

Jyrki Alakuijala

2025-12-01 05:12:28

yes, that looks wrong -- perhaps you are better equipped to fix it?

_wb_

2025-12-01 06:21:41

in my view, JPEG AI is interesting for academics and for chip manufacturers who need new reasons to sell chips, but has little practical value for most use cases.

2025-12-01 06:28:57

The focus on very low bitrates is unfortunately still strong, not just in the JPEG committee but in many people (from enthusiasts to experts) looking at lossy compression. I think the idea is "let's see how far we can push this thing until it breaks, and then when it breaks, by studying the resulting scattered bits of image we will understand it", like how particle physicists are studying elementary particles. But that's of course a rather one-sided and largely irrelevant way of studying lossy compression. It's like comparing racing bikes by putting them in a hydraulic press and declaring the one that takes longest to explode into tiny bits the best racing bike.

AccessViolation_

2025-12-01 06:31:26

I personally feel that way about low bitrate compression too. it's fascinating to see how small you can get an image that bears a resemblance to the original, but I would never actually use images like that myself or serve them to others. at least for me it's a curiosity more than anything

jonnyawsom3

2025-12-01 06:33:45

I can't wait for the AI bubble to burst, and all the compute spent on making animals tell fart jokes can be spent on meaningful and novel applications instead

AccessViolation_

2025-12-01 06:33:52	I wonder to what extent Huawei's interest is because JPEG AI sounds particularly marketable these days
2025-12-01 06:36:05	"The first smartphone that natively supports the next-gen JPEG AI format" and it's a toggle in the camera app or something

_wb_

2025-12-01 06:53:46

If you want to have a reason to upgrade a phone, formats that are fast enough for CPU are not so nice, and new stuff that requires at least a good GPU and preferably specialized neural hardware are nice. That's why hardware companies tend to be hyping anything AI.

AccessViolation_

2025-12-01 06:57:00	that checks out. one of my old Huawei phones has a CPU made by them and contains some AI cores. nothing used them, except one or two apps that were made in partnership with Huawei (Microsoft Translator was one), and presumably some of their own stock apps
2025-12-01 06:58:28	because why would apps that use machine learning spend time and effort specifically to optimize for the machine learning interface in Huawei CPUs. the machine learning in the app would already be tuned to run on any phone

jonnyawsom3

2025-12-01 07:09:51

That's actually my current phone. The first phone with Tensor cores, but only the Microsoft translator app and the camera object recognition used it

_wb_

2025-12-01 07:10:02

Academically speaking, I do think JPEG AI is interesting, e.g. anything that can help to better compress images is probably also going to be useful for other tasks where conventional (non-AI) methods are much harder to do, like computer vision, generative AI, etc. And also having something that produces completely different artifacts from the usual block transform codecs, is interesting if you want to better understand human perception of image quality. But as an actual image format, I think the practical utility of JPEG AI is limited.

spider-mario

2025-12-01 07:17:49	to me, it’s the same kind of, let’s say, “intellectual satisfaction” as “look everyone, I managed to port this game to an original Game Boy / compile C# code for Windows 3.11”
2025-12-01 07:18:05	“I managed to make a blurry version of this photo but with the text still legible fit into a 30kB file” is in the same realm

AccessViolation_

2025-12-01 07:19:22

I agree, that puts it pretty well

spider-mario

2025-12-01 07:20:37

(https://x.com/MStrehovsky/status/1215331352352034818)

AccessViolation_

2025-12-01 07:20:38	if you think about it, code golfing, speedrunning, compressing images like that, running doom on a washing machine, etc, all sort of fit into this category
2025-12-01 07:21:44	going to the moon? :p

Magnap

2025-12-01 07:25:06

"achievements you'd think would be impossible under such constraints" 🤔

jonnyawsom3

2025-12-01 07:25:33

"Things that shouldn't be done, but you could"

AccessViolation_

2025-12-01 07:26:22	"things we do not because they are easy but because they are hard"
2025-12-01 07:28:11	I actually really don't like that speech, or at least that part of it
2025-12-01 07:28:24	this may be controversial
2025-12-01 07:28:49	it sounds like he made it up on the spot

Magnap

2025-12-01 07:52:21

are JXL SMPTE timecodes BCD or just normal u8s?

_wb_

2025-12-01 08:03:26

> `timecode` indicates the SMPTE timecode of the current frame, or 0. The decoder interprets groups of 8 bits from most-significant to least-significant as hour, minute, second, and frame. If `timecode` is nonzero, it is strictly larger than that of a previous frame with nonzero duration.

Magnap

2025-12-01 08:08:18

I read that, but it doesn't technically specify how to interpret each group of 8 bits, and wikipedia told me that SMPTE timecodes are > typically represented in 32 bits using binary-coded decimal

_wb_

2025-12-01 08:14:02	it's just u8s afaik
2025-12-01 08:15:58	then again if the SMPTE convention is to use 4 bits per decimal digit within those u8s, then probably we follow that

Magnap

2025-12-01 08:16:44

makes more sense, but I figured it was worth asking, it seemed like the sort of thing where benefits like "you can represent 255 hours rather than only 99" would be outweighed by "every tool and device in existence assumes it's BCD"

_wb_

2025-12-01 08:17:04

as far as the spec is concerned it's just a blob of 4 bytes that is supposed to get larger values when interpreted as big-endian u32

Magnap

2025-12-01 08:18:13

fair enough, I'll represent it as such in my API then 😅 push it to the user (who will also be me 😅) to figure out what to put there

_wb_

2025-12-01 08:21:30

it's probably something that is hairy and old enough so that all ways of doing it are actually done by some application so whatever you do, you'll probably be right according to something and wrong according to something else

AccessViolation_

2025-12-01 08:24:43	is B supposed to look more or less like X and Y here
2025-12-01 08:26:19	I assume not, otherwise I imagine JXLs wouldn't look right at all
2025-12-01 08:26:34	but it's curious B is so wildly different

_wb_

2025-12-01 08:27:54	hm looking at the SMPTE spec itself (SMPTE ST 12-1:2014, https://pub.smpte.org/pub/st12-1/st0012-1-2014.pdf), it looks like it's really BCD
2025-12-01 08:28:08	so probably in jxl it is also BCD

jonnyawsom3

2025-12-01 08:29:14

Y should be lower/greener since it's effectively luma, and B should be higher than X because we're less sensitive to blue, but over 4x higher is very harsh

AccessViolation_

2025-12-01 08:32:09

I see

username

2025-12-01 08:38:44

does libjxl even currently have support in the code for writing out a custom quant table?

AccessViolation_

2025-12-01 08:39:01

I was actually just reading up on that

username

2025-12-01 08:39:21

I feel like the encoder doesn't have that internally but I could be wrong

AccessViolation_

2025-12-01 08:39:42

jonnyawsom3

2025-12-01 08:40:13

Nope, hence asking Jyrki for help

_wb_

2025-12-01 08:40:32

There should be code for that, but afaik the current encoder only uses RAW (when recompressing JPEGs) and Default (when doing lossy from pixels).

AccessViolation_

2025-12-01 08:41:30

I'm glad these tunes here can be done by tweaking some parameter values, I was worried we'd have to signal entire quant tables ^^

jonnyawsom3

2025-12-01 08:46:08

Even then it'd only be a few hundred bytes overhead

AccessViolation_

2025-12-01 08:48:01	what a beautifully elegant format
2025-12-01 08:48:12	I know I'm preaching to the choir but it deserves to be said occasionally
2025-12-01 09:07:55	> JPEG XL also uses a (generalized) zig-zag ordering by default, but it allows signaling an arbitrary coefficient order. is this a 'circular' zigzag pattern like in the image below, or one that goes in straight diagonal lines?
2025-12-01 09:14:36	if it's a 'straight line' zigzag pattern it looks like it's relatively cheap to transform it into the round one

veluca

2025-12-01 09:16:45

straight diagonals by default

jonnyawsom3

2025-12-01 09:22:27

Could be worth doing along with disabling progressive_ac in favour of progressive_dc only by default

veluca

2025-12-01 09:40:19

I mean, the default uses a heuristic to pick an order that optimizes compression

jonnyawsom3

2025-12-01 10:41:31

Interestingly, lossy modular goes harder on X but doesn't scale B as hard. Probably evens out with the other weights though

A homosapien

2025-12-02 02:10:22	I've noticed the desaturation be much more noticeable on lossy modular compared to VarDCT
2025-12-02 02:10:38	I need to do more testing to be sure

Meow

2025-12-02 02:11:31

A lot of people don't care about images being low-quality. Instead it's a certification of becoming popular as it often occurs when an image is reprocessed dozens of times

Magnap

2025-12-02 01:21:03

btw, the 4096x downscaling mention in the paper as possible with 4 LF frames, wouldn't the smallest LF frame itself have LF groups thus giving an additional 8x?

_wb_

2025-12-02 01:28:57

Yes, or it could use modular with squeeze and go even further

Jyrki Alakuijala

2025-12-03 01:00:53	The "The Humanity Needs Worse Looking Images" people feel incurable to me. No matter how much I tried, I was never able to convince one of them with logic to readjust their position. Nowadays, I just let them fail on their own without me interfering with their process.
2025-12-03 01:03:11	I did those in 2019 or so, in a 3-6 month project. It would probably take me more to get started again than I can currently invest in it. I would welcome someone else taking a go on it, even it could make it into the default libjxl encoder. I feel we have learned a lot about the deficiencies that exist and I would have readiness to accept changes there.
2025-12-03 01:08:08	this looks good on a product managers slide deck when the SVP or VP asked for more AI, but doesn't feel good in customers hand if the image comes slower and the phone is getting hot while browsing through images
2025-12-03 01:12:21	"A lot of people don't care about images being low-quality." Correct. I feel responsible to make it right for them, too. Even when they cannot verbalize it, the poor quality can still affect their subconsciousness, it can cause them to make short-sighted e-shopping decisions when they don't 'feel' the quality of fabric through the poor-quality image, it can change their impression on how people should look like when all skin defects are blurred away, etc. etc. It is like drinkable water in the facet -- most uses of water (washing machine, flushing the toilet, showering, washing hands, ...) don't care about if the water is drinkable, but it is so practical when you have it and get used to it.

HCrikki

2025-12-03 01:19:07	people need to realize a single youtube video they watch and forget consumes bandwidth equivalent to like 700 high quality images
2025-12-03 01:19:49	there's no need to starve your images of bitrate to the extent they look today as bad as quality 50% jpegs from 2002
2025-12-03 01:21:35	just since 3g, internet speed increased by like x100 - nut that videos matched this growth but images didnt. storage and especially bandwidth got ludicrously cheaper since too

Jyrki Alakuijala

2025-12-03 02:35:47

Every 10 years consumer Internet bandwidth grows by 10x.

Meow

2025-12-03 03:31:41	Unfortunately services would compress even further when an image can provide the better quality
2025-12-03 03:32:15	They know the minimal quality that customers can accept

Quackdoc

2025-12-03 04:05:22	meanwhile here in rural canada internet took a massive step backwards
2025-12-03 04:05:44	well, unless you count starlink, but that can be fairly specific

Fahim

2025-12-03 07:13:26

There's the extortion [for the lack of a better term] schemes from certain large ISPs too, for peering (Deutsche Telekom and now Vodafone comes to mind, for example - same for certain SEA ISPs, South Korea's ISPs at large, etc from what I recall) - I'm not German but I have German friends whose internet experience (sans VPN) gets sent back 30 years for many sites because Cloudflare doesn't want to participate in that racket

2025-12-03 07:13:51

Vodafone's change is super recent too, just last month https://www.heise.de/en/news/Vodafone-leaves-public-internet-exchange-points-11068836.html

tokyovigilante

2025-12-03 08:00:39	is it possible to construct a bitstream which does a progressve load using VarDCT LF -> HF -> then a further transform to (mathmatically) lossless?
2025-12-03 08:01:07	ie a lossless format for archival but also good for web delivery?

ignaloidas

2025-12-03 08:07:08	you can have a preview frame, so potentially a VarDCT progressive frame followed by a lossless frame could work
2025-12-03 08:08:29	Alternatively, you could set all quantization coefficients of VarDCT to 1, which reduces the lossiness of it to precision errors while doing (I)DCT

lonjil

2025-12-03 08:11:27

Easiest would be a lossless kAdd frame that just adds the difference between the original image and whatever VarDCT created.

ignaloidas

2025-12-03 08:12:40

wouldn't end up (reliably) mathematically lossless because VarDCT decoding has some leeway on precision

lonjil

2025-12-03 08:13:16	Hm
2025-12-03 08:13:54	If the leeway is small enough it'd work for inputs below a certain bitdepth.

tokyovigilante

2025-12-03 08:38:05

Thanks. Images are typically 10-14 bits and usually pixel data is packed (unpacked) in 16 bits. I do need them to be mathematically lossless for storage but storage size isn't a huge concern so could use the preview option.

Magnap

2025-12-03 09:09:26

how about lossy modular?

ignaloidas

2025-12-03 09:10:48

lossy modular and then a lossless kAdd frame with corrections would work, yes

Magnap

2025-12-03 09:12:27

I've thought about that for lossy, like maybe you could do a really shitty VarDCT and then lossy-Modular encode a frame with the negated artifacts and residuals. But there's probably something preventing it along the lines of the "the residuals would have almost as much entropy as the image itself, so the shitty VarDCT frame would be a waste of space"

monad

2025-12-03 09:15:05

would lossless+squeeze be good enough?

Info

JPEG XL

General chat

Voice Channels

Archived

jxl

Anything JPEG XL related