JPEG XL

Info

rules 57
github 35276
reddit 647

JPEG XL

tools 4225
website 1655
adoption 20712
image-compression-forum 0

General chat

welcome 3810
introduce-yourself 291
color 1414
photography 3435
other-codecs 23765
on-topic 24923
off-topic 22701

Voice Channels

General 2147

Archived

bot-spam 4380

jxl

Anything JPEG XL related

dogelition
dogelition chromium's tone mapping also looks nice, see here https://discord.com/channels/794206087879852103/824000991891554375/1283499159332196403 (assuming you're using the discord website/desktop app and hdr is disabled)
2024-09-12 09:40:07
it doesn't look nice when you open the png in firefox though, which afaict uses the 3d lut built into the icc profile instead of doing some tone mapping by itself
Demiurge
_wb_ Normally I would add some dark beer to the batter but the only bottles I had available were all too good to be used for that. Used sparkling water instead, that also works.
2024-09-12 10:16:01
literally no such thing as good beer
2024-09-12 10:17:36
Not even the dark stuff. Although the dark stuff is slightly more drinkable than the watery stuff.
yoochan
Demiurge literally no such thing as good beer
2024-09-12 10:28:22
if you like beer and are from Belgium, there is
Demiurge
2024-09-12 10:35:58
that's a big IF
TheBigBadBoy - 𝙸𝚛
2024-09-12 10:38:52
then "if you come to Belgium, there is" <:CatSmile:805382488293244929>
_wb_
2024-09-12 10:43:08
Good beer most definitely exists. But obviously, de gustibus non disputandum est.
Traneptora
2024-09-12 07:23:16
how does libjxl convert XYB-grayscale back to single-channel? does it invert the XYB and grab the green channel?
2024-09-12 07:23:44
or does it average the three channels?
_wb_
2024-09-12 07:27:17
XYB grayscale just has X and B all-zeroes. Converting that to RGB will produce R=G=B. So it doesn't really matter how you convert that to grayscale 🙂
Traneptora
2024-09-12 07:28:07
ah. atm I'm just grabbing the green channel. but I'm failing conformance on grayscale_public_university so I need to figure out why
_wb_
2024-09-12 07:38:35
That's not the reason then.
Traneptora
2024-09-12 07:39:49
I wonder if it's epf again. Epf is performed in XYB space so maybe it's that?
Fraetor
monad That's not me, it's <@176735987907559425>. I thought you were looking for sample code, and that is a very readable one.
2024-09-12 08:42:27
I should really get back to writing that, it was really fun to make, and I'm a better developer now too.
2024-09-12 08:53:12
It's quite cathartic actually, as unlike all my work projects, the requirements for a decoder are perfectly clear, and even have a nice specification.
CrushedAsian255
2024-09-13 02:15:42
`-d 1 -e 7` is default right?
jonnyawsom3
2024-09-13 02:37:30
Unless it's a jpeg or a GIF input, yes
Oleksii Matiash
2024-09-13 06:18:53
Just curious, why lossless compression of 8-bit file produces containerless file, and the same file in 16 bit forces container usage? There is no metadata, so it is not the reason
CrushedAsian255
2024-09-13 06:19:56
Can you send the files?
_wb_
Oleksii Matiash Just curious, why lossless compression of 8-bit file produces containerless file, and the same file in 16 bit forces container usage? There is no metadata, so it is not the reason
2024-09-13 06:23:25
The reason is that containerless is implicitly Main Profile, Level 5. For any other profile/level, you have to use a container and indicate what it is with a `jxll` box.
2024-09-13 06:24:37
For 16-bit lossless, after RCTs etc (even before, actually), things don't fit in int16 buffers, which is a requirement for level 5.
2024-09-13 06:25:44
You can only go up to 14-bit or so while staying within Level 5. For lossless that is — for lossy, bitdepth is just a suggestion and doesn't matter.
Oleksii Matiash
2024-09-13 06:26:39
Thank you
_wb_
2024-09-13 06:29:54
You _can_ strip the container in this case and it will work in all current decoders. But technically it's invalid, because the codestream is doing things it is not allowed to do within Main Profile, Level 5. In theory, decoders could refuse to decode such invalid bitstreams. Libjxl does not want to create invalid bitstreams so it automatically does the right thing. Files produced by jxl_from_tree are always containerless and can be technically invalid files though.
Oleksii Matiash
2024-09-13 06:31:27
Thank you again 🙂
CrushedAsian255
_wb_ You _can_ strip the container in this case and it will work in all current decoders. But technically it's invalid, because the codestream is doing things it is not allowed to do within Main Profile, Level 5. In theory, decoders could refuse to decode such invalid bitstreams. Libjxl does not want to create invalid bitstreams so it automatically does the right thing. Files produced by jxl_from_tree are always containerless and can be technically invalid files though.
2024-09-13 08:24:33
im guessing Level 5 is more web-targeting?
_wb_
2024-09-13 08:26:04
Exactly. From the spec: > To promote interoperability, a single profile, named “Main” profile, is defined. This profile is intended for use (among others) in mobile phones, web browsers, and image editors. It includes all of the coding tools in this document. > > The Main profile has two levels. Level 5 is suitable for end-user image delivery, including web browsers and mobile apps. Level 10 corresponds to a broad range of use cases such as image authoring workflows, print, scientific applications, satellite imagery, etc. > > Levels are defined in such a way that if a decoder supports level N, it also supports lower levels. > > Unless signalled otherwise, a JPEG XL codestream is assumed to be conforming to the Main profile, level 5.
CrushedAsian255
2024-09-13 08:26:48
so using Level 10 images may cause incompatibilities with that Rust JXL decoder that Firefox might add
lonjil
2024-09-13 08:29:01
seems unlikely
CrushedAsian255
2024-09-13 08:29:56
they probably don't need any features in level 10 and it would lower the chance of malicious image files crashing / hanging the system / process
_wb_
2024-09-13 08:30:43
I don't think so, probably it will just decode anything. But e.g. CMYK is not in level 5 and browsers generally don't want to bother with implementing proper color management for it, so that would be one thing in Level 10 that probably will not be properly supported on the web. Also for some of the limits regarding spline sizes and number of channels etc, a browser decoder might want to refuse decoding things outside Level 5.
CrushedAsian255
_wb_ I don't think so, probably it will just decode anything. But e.g. CMYK is not in level 5 and browsers generally don't want to bother with implementing proper color management for it, so that would be one thing in Level 10 that probably will not be properly supported on the web. Also for some of the limits regarding spline sizes and number of channels etc, a browser decoder might want to refuse decoding things outside Level 5.
2024-09-13 08:31:45
so they will end up supprting level 7.5 or something?
2024-09-13 08:33:44
(made up name)
_wb_
2024-09-13 08:37:49
We haven't defined intermediate levels but yeah, in practice a browser implementation would only guarantee to conform to Level 5 (and even that only in a "best effort" way in the sense that if you want to see a huge image, you'll need enough memory for it), but it will also decode many Level 10 files. If you want to be sure it will not be rejected, you should stick to Level 5, but most of the limits of Level 5 will likely not be hard-enforced.
jonnyawsom3
2024-09-13 08:38:18
Level 5 supported, higher not guaranteed
CrushedAsian255
2024-09-13 08:39:05
so it's basically "We can do Level 5 (as long as you're not being stupid with resolutions) and can probably load some/most level 10 images, but don't count on it"?
Oleksii Matiash
2024-09-13 08:41:53
Do I understand correctly that 16 bpp lossless files are not guaranteed to be decoded by browser? I'm just curious, 16 bpp ll obviously not the most used format over the internet
Tirr
2024-09-13 08:43:19
browsers would fully implement Level 5, while Level 10 is not strictly required but they may implement some of those. the impl. might not be fully conforming to Level 10 though
CrushedAsian255
2024-09-13 08:43:48
is there a way to tell `djxl` to reject certain images?
_wb_
CrushedAsian255 so it's basically "We can do Level 5 (as long as you're not being stupid with resolutions) and can probably load some/most level 10 images, but don't count on it"?
2024-09-13 08:43:54
yes, that's the idea. In particular, if you're going to have very big splines that require a lot of effort to render, probably it's going to be rejected right away if it's outside Level 5, and if you use CMYK, you'll probably not get correct color management. Also animations faster than 120 fps will be slowed down to 120 fps, and maybe some other things.
CrushedAsian255
2024-09-13 08:44:38
the idea is that things in level 10 are probably not going to be particularly useful on the web, right? who needs 120+fps GIFs
Tirr
2024-09-13 08:44:44
especially Level 10 requires lossless f32 being bit-exact
_wb_
CrushedAsian255 is there a way to tell `djxl` to reject certain images?
2024-09-13 08:45:27
currently I don't think so, probably it would be useful to have some options to let it check the Levels limits and to put limits on image dimensions / nb of channels / nb of frames/layers
CrushedAsian255
2024-09-13 08:45:41
also with `modular_16_bit_buffers` can I still use 32 bit buffers?
_wb_
Oleksii Matiash Do I understand correctly that 16 bpp lossless files are not guaranteed to be decoded by browser? I'm just curious, 16 bpp ll obviously not the most used format over the internet
2024-09-13 08:47:15
That's right, but this is probably one of the things where browsers will allow a bit more than what Level 5 says, e.g. still allow up to 16-bit (but maybe not all the way to 32-bit).
Oleksii Matiash
2024-09-13 08:47:32
Thank you
_wb_
CrushedAsian255 also with `modular_16_bit_buffers` can I still use 32 bit buffers?
2024-09-13 08:54:09
If this header field is true (like it has to be in Level 5), then you can in principle use int16 buffers in the implementation of all modular decoding, while otherwise you have to use twice the memory and use int32 buffers. Currently libjxl ignores the field and always uses int32 buffers anyway (we haven't bothered with writing specialized/optimized code paths yet). For lossy, even at pretty high quality, int16 buffers do suffice since the modular data only contains quantized coefficients — after dequantization you'll need float32 to get enough precision. All of this is unrelated to what kind of buffers the application uses at the libjxl API level to pass images to the encoder or to get images from the decoder.
2024-09-13 09:01:44
In practice, there is not a lot of lossless data that actually has a full 16 bits of precision. A lot of things end up just being scaled to 16-bit because PNG, TIFF, PPM etc are byte-based so it's either 8-bit or 16-bit. But the actual precision of cameras is more like 14-bit, and for Rec2100 PQ/HLG, 12-bit suffices (and most displays only support 10-bit precision). This is why we considered this an acceptable limitation for Level 5.
CrushedAsian255
2024-09-13 09:03:09
Can level 5 12 bpp?
_wb_
2024-09-13 09:05:34
Up to 15-bit is possible (one bit is lost because int16 is signed and image data is usually unsigned) but not very effective since then you can't do RCTs. Up to 14-bit is possible with effective compression.
2024-09-13 09:06:57
You are also allowed to make a lossy file that is marked as being 16-bit (though not higher than that, in Level 5).
CrushedAsian255
2024-09-13 09:07:03
Is JXL’s max int32/uint31?
Tirr
2024-09-13 09:08:36
int24 and float32
CrushedAsian255
2024-09-13 09:09:48
How does float32 fit in an int32?
_wb_
2024-09-13 09:09:58
just bit casting
CrushedAsian255
2024-09-13 09:10:11
Then wouldn’t negative numbers go all wonky?
2024-09-13 09:10:20
Or are negative floats inverted?
_wb_
2024-09-13 09:10:27
negative floats just become negative ints that way
CrushedAsian255
2024-09-13 09:10:43
But they go the other way compared to 2s compliment?
_wb_
2024-09-13 09:11:10
no it's the same
2024-09-13 09:11:20
sign bit 0 means positive in both cases
CrushedAsian255
2024-09-13 09:12:02
No, as in 0xffffffff is the negative 2s compliment number closest to 0
2024-09-13 09:12:22
But the closest negative float to 0 is 0x80000001
_wb_
2024-09-13 09:16:30
right, in that sense they go the other way
CrushedAsian255
2024-09-13 09:17:10
Sorry if my wording was confusing
_wb_
2024-09-13 09:18:25
typical float data is in the 0..1 range though, so the first two bits are zeroes and it makes sense when cast to int32 in the sense that order is preserved (if a < b when interpreted as float, also a < b when bitcast to int32)
2024-09-13 09:20:01
in any case, predictors get a bit wonky when they get float-bitcast-to-int data, but still make some amount of sense and help with compression
CrushedAsian255
2024-09-13 09:20:39
Float bit cast to int is kinda a piece wise linear exponential function?
_wb_
2024-09-13 09:20:53
yeah that's basically what it does
2024-09-13 09:22:06
which is kind of OKish if the data itself is linear light, so the bitcasting is basically similar to applying a log transfer curve
2024-09-13 09:23:54
but lossless compression of float32 is somewhat limited, e.g. you cannot do squeeze or RCTs (if the data can be arbitrary floats) since that would require more than int32 for the residuals.
2024-09-13 09:24:35
our assumption was that if you want to do lossless float32, you don't really care too much about compression 🙂
CrushedAsian255
2024-09-13 09:25:49
when i was saying "the other way" i meant like this, where NaNs are in red
2024-09-13 09:26:02
where if the left side was bit flipped it would look more like this
2024-09-13 09:26:16
2024-09-13 09:26:44
which matches more like how int works
2024-09-13 09:26:51
_wb_
2024-09-13 09:28:12
hmyeah probably we should have done that, I hadn't given it much thought since the float image data I could find didn't go negative
2024-09-13 09:28:20
too late for that now though
CrushedAsian255
2024-09-13 09:29:13
i guess float data is usually not going negative
2024-09-13 09:29:25
and when it is, it's probably an anomaly so doesn't matter about compression
2024-09-13 09:29:53
and also what you said earlier
2024-09-13 09:29:59
> our assumption was that if you want to do lossless float32, you don't really care too much about compression 🙂
_wb_ too late for that now though
2024-09-13 09:32:18
extensions? idk lol
2024-09-13 09:32:27
don't quite understand what they do
username
2024-09-13 09:35:33
what is the resolution limit set as in level 5?
CrushedAsian255
2024-09-13 09:36:16
Max width/height : 256k
2024-09-13 09:36:23
Can’t be more than 256 MPx
2024-09-13 09:36:40
So maximum of 256k by 1024
2024-09-13 09:36:44
If I’m reading the spec right
2024-09-13 09:36:49
Or other way round
username
2024-09-13 09:37:46
just to be clear you mean something like 256,XXX x 256,XXX?
CrushedAsian255
2024-09-13 09:39:24
Maximum is
username
2024-09-13 09:40:14
I have a headache at the moment and just want it in basic terms lol
2024-09-13 09:41:15
I might just go calculate it when my headache goes away
CrushedAsian255
2024-09-13 09:41:28
Maximum width or height by it self is 262144
2024-09-13 09:41:47
Multiplied together the total amount of pixels can’t be more than 256 million
_wb_
2024-09-13 09:42:26
So basically the limit is 256 megapixels, with an aspect ratio that is not too extreme
2024-09-13 09:43:38
(between 256:1 and 1:256)
2024-09-13 09:44:07
That should be enough for the web
username
2024-09-13 09:47:23
around 4 times the max res of JPEG 1 so yeah it should be plenty and also allow for some breathing room for the future as well
_wb_
2024-09-13 09:48:21
On an 8k screen, that is 16 screenfulls. If you want bigger than that, it's probably a good idea to split it up into several images.
CrushedAsian255
2024-09-13 09:48:58
Or maybe don’t do that on the web
_wb_
2024-09-13 09:50:04
We can always define a Level 6 if it would become a real limitation at some point, but I don't expect that to happen anytime soon or even ever.
CrushedAsian255
2024-09-13 09:50:41
So That’s why there’s a gap?
2024-09-13 09:50:45
Like BASIC lines?
2024-09-13 09:51:08
They’re no plan of level 4 for low specification ?
username
CrushedAsian255 They’re no plan of level 4 for low specification ?
2024-09-13 09:57:22
Should a need arise then there might be a level 4 or whatever however there's really no reason to define one at the moment and keep in mind that there's a real danger in artificially slapping on limitations. For example we could have had native 12-bit JPEGs but no one ever implemented support because it was seen as un-needed at the time and now because of that we are stuck with 8-bit JPEGs because using 12-bit ones would immediately break compatibility with 90% of software and hardware.
CrushedAsian255
2024-09-13 09:59:03
Same with arithmetics?
username
username Should a need arise then there might be a level 4 or whatever however there's really no reason to define one at the moment and keep in mind that there's a real danger in artificially slapping on limitations. For example we could have had native 12-bit JPEGs but no one ever implemented support because it was seen as un-needed at the time and now because of that we are stuck with 8-bit JPEGs because using 12-bit ones would immediately break compatibility with 90% of software and hardware.
2024-09-13 09:59:22
of course spec defined levels/profiles is a bit different then what happened with JPEG 1 but eh better safe then sorry
CrushedAsian255
2024-09-13 09:59:25
Actually, that was because of patterns wasn’t it?
2024-09-13 09:59:29
Patents*
username
2024-09-13 09:59:37
yeah
CrushedAsian255
2024-09-13 10:00:19
Why don’t people use JNG for transparent photographic ?
username
CrushedAsian255 Why don’t people use JNG for transparent photographic ?
2024-09-13 10:01:04
what is that again? is it that format that uses JPEG for the base image and PNG for alpha?
2024-09-13 10:02:36
because if so then well iOS apps (don't know about modern ones) and Adobe Flash both used it so that's something
CrushedAsian255
2024-09-13 10:02:42
http://www.libpng.org/pub/mng/spec/jng.html
username because if so then well iOS apps (don't know about modern ones) and Adobe Flash both used it so that's something
2024-09-13 10:02:55
They did?
2024-09-13 10:02:57
Never knew
username
2024-09-13 10:06:00
don't know if they *specifically/exactly* used JNG or just independently invented/re-invented the same idea/thing but yeah they used combined JPEG and PNG data for storing assets
CrushedAsian255
2024-09-13 10:08:37
Interesting
2024-09-13 10:08:39
Actually, a JNG may contain two separate JNG JPEG datastreams (one eight-bit and one twelve-bit), each contained in a series of JDAT chunks, and separated by a JSEP chunk (see the JSEP chunk specification below, Paragraph 1.1.5). Decoders that are unable to (or do not wish to) handle twelve-bit datastreams are allowed to display the eight-bit datastream instead, if one is present.
username
2024-09-13 10:08:52
whenever I go to extract SWF files the assets come out like that and also I looked at an old iOS app and it was storing the main color data as JPEG and the alpha as separate PNG files
2024-09-13 10:09:09
so maybe not exactly JNG but just the same basic idea
CrushedAsian255
username whenever I go to extract SWF files the assets come out like that and also I looked at an old iOS app and it was storing the main color data as JPEG and the alpha as separate PNG files
2024-09-13 10:09:17
Both probably “reinvented” it as it’s a relatively straightforward and obvious idea
username
2024-09-13 10:09:33
yeah
_wb_
CrushedAsian255 Like BASIC lines?
2024-09-13 10:27:37
Yes, exactly like BASIC line numbers. We want to keep the invariant that supporting a higher level implies supporting any lower level, so leaving room for potential new levels, should the need arise, seemed wise.
2024-09-13 10:36:45
We _could_ at some point define a lower level with stricter limits, in case we want to define use cases where conformance is guaranteed in a hard way, i.e. for systems where you actually want to guarantee to be able to decode the image in a given time budget and with a given amount of available memory. For now, that need has not yet arisen. We will probably at some point define a lower **profile**, which is not just putting limits on the sizes of things, but also putting limits on which coding tools are allowed / have to be implemented. For the camera use case, you don't need/want extra channels, patches, splines, big blocks, etc. When defining a subset of coding tools, it's not just a levels thing but it's a different profile. Probably we'll make the level numbering global though, so e.g. this lower profile might have levels 1 and 3, for example. There might at some point also be extensions that will require defining a _higher_ profile, but for now, that need has not yet arisen. It is something we prefer to avoid. In general we really want to avoid having different profiles. For interoperability it is best if all decoders (at least the software ones) can just decode anything. This is why for now, there is only one profile ("Main") and conformance requires implementing everything.
CrushedAsian255
2024-09-13 11:44:33
So you might have “capture” profile that doesn’t have the fancy tools?
Oleksii Matiash
CrushedAsian255 So you might have “capture” profile that doesn’t have the fancy tools?
2024-09-13 11:47:55
We already have it - 8-bit jpeg. I mean it is very dangerous approach to allow to limit decoders to the level that is enough for cameras
CrushedAsian255
2024-09-13 11:48:19
theoretically a camera could just take an 8 bit jpeg and then lossless jxl it
_wb_
2024-09-13 11:51:03
It would be a profile intended basically only for on-device viewing of the pictures you just took.
CrushedAsian255
2024-09-13 11:53:52
would it basically be lossy `-e 5`?
_wb_
2024-09-13 11:56:25
more like the output of libjxl-tiny
CrushedAsian255
_wb_ more like the output of libjxl-tiny
2024-09-13 12:24:23
What coding features does it use?
_wb_
2024-09-13 12:51:55
basically vardct with smaller block sizes only (up to 16x16 iirc)
2024-09-13 12:54:31
for d < 1 and on photographic images, it's pretty much almost as good as default libjxl; for images that benefit from patches (like screenshots with text) or at low quality, it's more like somewhere between jpegli and libjxl. But the main use case would be cameras, so low quality and non-photographic images don't really matter
CrushedAsian255
2024-09-13 12:57:52
I’m guessing no fancy MA trees?
_wb_
2024-09-13 01:16:54
yeah, we haven't yet decided on what limits to impose on that but likely the shape of the MA trees will be heavily constrained. It's only used for a small amount of data anyway (the DC and some control fields). Same with the context model size for the AC. Allowing arbitrary MA trees for a hardware decoder would be pretty inconvenient 🙂
jonnyawsom3
2024-09-13 01:20:06
A fixed MA tree like effort 3?
_wb_
2024-09-13 01:26:50
probably something like that — anything that can be done without branches or large lookup tables.
KKT
2024-09-13 06:37:12
OK, I've got a bit of a weird one. My kid's school sent out a link for all the photos from last year in Google Photos (many thousands of them). Downloaded them all and ran them through with: `parallel cjxl --num_threads=0 -j 0 -d 1 -e 9 {} {.}.jxl ::: ./*.jpg` Obviously quality is already degraded, so they don't have to be awesome. Most were taken with an iPhone. I'm getting a shift in the HDR – JPEG's highlights are noticably brighter. Attached is a good example. They're close to the same in Preview, but not exactly. Quicklook doesn't do HDR for JXL files at all. Preview shows the JPEG as 10 bits. Exif tool shows `Profile Connection Space: XYZ`. So these are Jpegli compressed?
embed
2024-09-13 06:38:07
https://embed.moe/https://cdn.discordapp.com/attachments/794206170445119489/1284221553160097862/IMG_0015.jxl?ex=66e5d805&is=66e48685&hm=619195ab9dde0cefa8be2d18d38edf488ea1c3ac8b37879ee3374d51ce740655&
KKT
2024-09-13 06:39:16
Ugh. Wrong image for the JXL
2024-09-13 06:39:30
This one.
spider-mario
KKT OK, I've got a bit of a weird one. My kid's school sent out a link for all the photos from last year in Google Photos (many thousands of them). Downloaded them all and ran them through with: `parallel cjxl --num_threads=0 -j 0 -d 1 -e 9 {} {.}.jxl ::: ./*.jpg` Obviously quality is already degraded, so they don't have to be awesome. Most were taken with an iPhone. I'm getting a shift in the HDR – JPEG's highlights are noticably brighter. Attached is a good example. They're close to the same in Preview, but not exactly. Quicklook doesn't do HDR for JXL files at all. Preview shows the JPEG as 10 bits. Exif tool shows `Profile Connection Space: XYZ`. So these are Jpegli compressed?
2024-09-13 06:42:47
note that XYB ≠ XYZ
2024-09-13 06:42:55
XYZ is the 1931 colorspace from the CIE
KKT
2024-09-13 06:42:57
Ohh, misread that
jonnyawsom3
2024-09-13 07:04:21
I keep making that mistake and getting excited at random ICC tags
RaveSteel
2024-09-13 07:09:44
Same
KKT
2024-09-13 07:50:51
The 10-bit JPEG got me pointed in the wrong direction too.
Demiurge
_wb_ We _could_ at some point define a lower level with stricter limits, in case we want to define use cases where conformance is guaranteed in a hard way, i.e. for systems where you actually want to guarantee to be able to decode the image in a given time budget and with a given amount of available memory. For now, that need has not yet arisen. We will probably at some point define a lower **profile**, which is not just putting limits on the sizes of things, but also putting limits on which coding tools are allowed / have to be implemented. For the camera use case, you don't need/want extra channels, patches, splines, big blocks, etc. When defining a subset of coding tools, it's not just a levels thing but it's a different profile. Probably we'll make the level numbering global though, so e.g. this lower profile might have levels 1 and 3, for example. There might at some point also be extensions that will require defining a _higher_ profile, but for now, that need has not yet arisen. It is something we prefer to avoid. In general we really want to avoid having different profiles. For interoperability it is best if all decoders (at least the software ones) can just decode anything. This is why for now, there is only one profile ("Main") and conformance requires implementing everything.
2024-09-14 03:17:29
Sounds kinda like PIK mode
2024-09-14 03:21:10
I'm interested in what Jyrki said about "frequency lifting" and I wonder what that was
BabylonAS
2024-09-14 09:26:41
Can JXL use indexed colors? I often deal with images that have less than 256 colors
spider-mario
2024-09-14 10:07:11
yes, in lossless mode
_wb_
2024-09-14 10:20:47
JXL can do any palette size up to 70k colors.
2024-09-14 10:21:35
Plus delta palette and default palette colors.
CrushedAsian255
_wb_ JXL can do any palette size up to 70k colors.
2024-09-15 12:28:40
70k?
2024-09-15 01:14:52
is there jpeg xl merch?
monad
2024-09-15 01:46:29
New fundraiser for Luca's Rust decoder.
CrushedAsian255
monad New fundraiser for Luca's Rust decoder.
2024-09-15 01:48:04
is this rust decoder going to be open source?
monad
2024-09-15 01:49:49
I would assume so, given the primary impetus is Firefox integration.
CrushedAsian255
2024-09-15 06:06:21
is 16 bit enough precision for lossless yuv422p10->rgb?
_wb_
2024-09-15 06:31:55
I would assume it is, but doesn't hurt to check. At least for 10-bit yuv444 you can just check all values and see if they roundtrip. Maybe the chroma subsampling makes it a bit harder to roundtrip though.
CrushedAsian255
2024-09-15 06:33:04
It’s not really a big deal if one pixels off by one value i was just wondering
_wb_
2024-09-15 07:06:04
It's mostly rgb -> yuv that is losing information (because you're mapping a color cube to a volume that only uses about 1/4th of the coordinate space), the other way around is less problematic.
CrushedAsian255
2024-09-15 11:46:13
???
2024-09-15 11:46:18
scam?
BabylonAS
2024-09-15 11:46:44
There was a spam attack
CrushedAsian255
2024-09-15 11:48:07
oop
2024-09-15 11:48:09
banned?
jonnyawsom3
2024-09-15 12:04:44
Already done
CrushedAsian255
2024-09-15 12:05:01
👍
AccessViolation_
2024-09-15 02:17:13
So, I've been wondering... with floating point color precision and support for many layers as gain maps or whatever else you want, would be be possible to create an image that covers...basically the full true luminance of everything in the shot? One scenario I imaged was a hacked together camera with a couple of solar filters that can be mechanically shifted before the lens. You could then take many exposures, one normal image that shows like a landscape, one with boosted EV to capture darker areas, and then a couple of shots that add the solar filters. Then you would use the knowledge of the solar filters and exposure settings and whatnot to create a JXL image that contains the sun in the shot as its *actual* luminance compared to all the other stuff. This could also potentially be cool for astrophotography. I recognize that this isn't compatible with HDR standards of any kind and you would probably need specialized software to view select luminescence ranges unless you also want to create a display capable of giving you sunburn, but anyway. How could a file like this theoretically look in JPEG XL?
CrushedAsian255
2024-09-15 02:18:16
Technically you could, it supports arbitrary float data
yoochan
2024-09-15 02:18:39
There is a guy here which used jxl to store altitude data iirc
w
2024-09-15 02:18:56
that's what cameras already do
CrushedAsian255
2024-09-15 02:19:04
That’s what the extra channels are for..?
w
2024-09-15 02:19:15
capturing x stops of dynamic range and fitting into a range of values
AccessViolation_
2024-09-15 02:21:19
Do you think just a single 32-bit precision layer would suffice for this, or would you also need one (or more) gain maps to capture the real luminance of something like the sun in the same something dimly lit?
w
2024-09-15 02:21:34
existing hdr formats already do that
CrushedAsian255
2024-09-15 02:21:44
32 bit float goes up to like 10^308 or something
w
2024-09-15 02:22:32
even non "hdr" formats can do that
CrushedAsian255
CrushedAsian255 32 bit float goes up to like 10^308 or something
2024-09-15 02:23:54
Wait no, that for doubles I think
w
2024-09-15 02:24:32
oh i guess the issue with PQ is that it maxes out at 10000 nits
CrushedAsian255
2024-09-15 02:24:44
I guess you can use linear?
AccessViolation_
2024-09-15 02:25:23
I wasn't sure if that extra precision was all mapped into a more...realistic range of values. If we can theoretically properly expose the sun and a dark room in the same image while preserving the actual real-life luminance difference between them in a 32-bit float image that would suffice. I just didn't expect that to be the case, because for basically every photo ever taken that means you're wasting a lot of the 32-bit range on values you'll never get
w
2024-09-15 02:26:08
that's always arbitrary and subjective
CrushedAsian255
2024-09-15 02:26:16
Due to how floats work you’re waiting 3/4 of the space
2024-09-15 02:26:29
As 0..1 gets the same allocation as 1..inf
2024-09-15 02:26:42
And then negatives are the other half
AccessViolation_
2024-09-15 02:26:58
Hmm
w
2024-09-15 02:27:30
https://en.wikipedia.org/wiki/Transfer_functions_in_imaging#List_of_transfer_functions
AccessViolation_
2024-09-15 02:31:19
I suppose one thing you could to to make sure you can represent the landscape with a good range and the details of the surface of the sun in a good range, is to try to get the limunance of everything in the base image layer close to 0 so you can represent the tones properly, and then have different gain maps for different orders of magnitude of brightness. Otherwise you might run into issues where you can represent the landscape in nice detail but the surface of the sun would suffer from lack of precision because it's very far away from 0 in the float range
2024-09-15 02:33:02
This is very cursed and definitely something I will explore further for astrophotography
w
2024-09-15 02:33:09
it's something camera raw formats try to solve
AccessViolation_
2024-09-15 02:37:31
Surely a RAW image couldn't store this large a range if you had a sensor or wacky device (like mentioned above) capable of capturing this level of contrast?
_wb_
2024-09-15 02:37:42
PQ maxes at 10000 nits but you can in principle make a JXL file that just uses a linear transfer function and stores them as lossles floats. Then you can set the intensity_target field to what you want the intensity of 1.0 to be, and just use values higher than 1.0 for anything brighter.
CrushedAsian255
2024-09-15 02:38:18
Is intensity target the nominal peak ?
AccessViolation_ Surely a RAW image couldn't store this large a range if you had a sensor or wacky device (like mentioned above) capable of capturing this level of contrast?
2024-09-15 02:38:48
RAW formats can store whatever the heck you want, that’s the point of RAW
w
2024-09-15 02:38:54
the range is abritrary
_wb_
2024-09-15 02:39:37
intensity target is the brightness in nits of the color (1.0, 1.0, 1.0)
w
2024-09-15 02:39:55
whiter than white
_wb_
2024-09-15 02:39:59
it is 255 nits by default, but you can signal it to be anything up to 65k nits
2024-09-15 02:40:18
but then there is nothing stopping you from putting values above 1.0 in a jxl file
CrushedAsian255
2024-09-15 02:40:23
So if I set intensity target to 1, then the floats store raw nit values?
_wb_
2024-09-15 02:41:04
yep
CrushedAsian255
2024-09-15 02:41:07
RGB values of (12345, NaN, -Infinity)
2024-09-15 02:41:13
Why not lmao
_wb_
2024-09-15 02:41:24
largest float32 is 3.4028234664 × 10^38
CrushedAsian255
2024-09-15 02:41:56
Yea I was seeing double
Oleksii Matiash
CrushedAsian255 RAW formats can store whatever the heck you want, that’s the point of RAW
2024-09-15 02:42:02
Only if you create raw format that uses 32 bpp, afaik there is no such existing
CrushedAsian255
Oleksii Matiash Only if you create raw format that uses 32 bpp, afaik there is no such existing
2024-09-15 02:42:25
You can make your own raw format
2024-09-15 02:42:44
Nothing will support it, but you can
jonnyawsom3
2024-09-15 02:45:31
Just swap out the JXL in a DNG for one encoded at 32f, what could go wrong :P
_wb_
2024-09-15 02:47:03
jxl can represent basically anything if you want — float32 precision where the brightness of 1.0 is adjustable means there is effectively no limit to the precision if you use it directly as a raw format. Note that in DNG they limit the precision to 16-bit (either uint16 or float16) since that is plenty of precision with current camera technology.
AccessViolation_
2024-09-15 02:48:37
Woah for astrophotography you could also store other radio frequencies beyond the visible spectrum as modular mode channels, right?
Quackdoc
2024-09-15 02:48:37
this is really common, maybe not to the extent you are talking about, but I actually talked about making a demo related to this in the website channel, but yeah, you would just use a linear transfer, and then in your application use exposure adjustment to change the stops you see. Im using EXR in this example because olive builds against a version of ffmpeg that has float broken, but the same concept applies
AccessViolation_
2024-09-15 02:48:46
(sorry to derail)
Quackdoc
2024-09-15 02:49:03
this is a video comparing exr vs png, but s/exr/jxl in supported apps
jonnyawsom3
AccessViolation_ Woah for astrophotography you could also store other radio frequencies beyond the visible spectrum as modular mode channels, right?
2024-09-15 02:51:01
There's actually been a lot of discussion here before for satellite imagery, weather radar and other multispectral things that could be useful for this too. https://discord.com/channels/794206087879852103/794206087879852106/1244203123648626739 https://github.com/openclimatefix/Satip/issues/67 https://github.com/libjxl/libjxl/issues/1245 Although, if you're using float32 I feel obliged to bring this up for the 500th time (Even though I only encountered it once when trying to compress a .HDR file) https://github.com/libjxl/libjxl/issues/3511
2024-09-15 02:51:29
So filesizes might not be representative of what JXL can achieve
Quackdoc
2024-09-15 02:52:43
jxl needs fp64, imagine everything you could store in it [pepehands](https://cdn.discordapp.com/emojis/1075509930502664302.webp?size=48&quality=lossless&name=pepehands)
CrushedAsian255
2024-09-15 02:53:07
Just have 2 f32 channels
2024-09-15 02:53:19
And concatenate them
Quackdoc
2024-09-15 02:53:48
we aren't gpus here, we have standards
2024-09-15 02:53:49
[av1_dogelol](https://cdn.discordapp.com/emojis/867794291652558888.webp?size=48&quality=lossless&name=av1_dogelol)
CrushedAsian255
2024-09-15 02:54:17
They’re paywalled 😦
Quackdoc
2024-09-15 02:54:23
[av1_omegalul](https://cdn.discordapp.com/emojis/885026577618980904.webp?size=48&quality=lossless&name=av1_omegalul)
2024-09-15 02:55:27
but yeah, storing wide range data in jxl is quite nice since it's good for not only photography, but also renders and the like
lonjil
2024-09-15 02:56:01
As I recall, though I may be misremembering, there's nothing inherently that stops JXL from supporting things bigger than fp32, it's just what they decided to set as the limit. The actual encoding, again as I recall I may be wrong, can handle bigger values just fine.
Quackdoc
2024-09-15 02:56:28
we need a JXL+
CrushedAsian255
2024-09-15 02:56:35
Maybe eventually they could make a JXL64 extension?
2024-09-15 02:56:41
Like a new profile?
2024-09-15 02:57:00
The problem is then that ruins backwards compatibility and it becomes another new forms
2024-09-15 02:57:15
Although most places that would use JXL wouldn’t need them
Quackdoc
2024-09-15 03:00:05
i dunno what one would actually use 64bit depth for lol
2024-09-15 03:00:08
scientific im sure
jonnyawsom3
lonjil As I recall, though I may be misremembering, there's nothing inherently that stops JXL from supporting things bigger than fp32, it's just what they decided to set as the limit. The actual encoding, again as I recall I may be wrong, can handle bigger values just fine.
2024-09-15 03:00:51
https://discord.com/channels/794206087879852103/804324493420920833/1209240294747406386
spider-mario
CrushedAsian255 Is intensity target the nominal peak ?
2024-09-15 03:21:42
it’s supposed to be the peak luminance found in the image, but it’s not strictly enforced
_wb_
2024-09-15 03:42:00
It kind of serves different purposes at the same time, doesn't it? It's a scaling factor for converting XYB to linear RGB too, right?
spider-mario
2024-09-15 03:44:15
yes (except when the target is PQ)
2024-09-15 03:45:11
I guess it does raise a bit of a dilemma for SDR images that don’t go all the way to SDR white
2024-09-15 03:45:38
although the spec does say that it just has to be _an_ upper bound, not necessarily the smallest possible upper bound
2024-09-15 03:46:01
so that scanning the image for the actual maximum isn’t mandatory
AccessViolation_
2024-09-15 06:20:43
I don't know where I read this, but I remember something about a feature where when a very high resolution image is to be displayed on a webpage, but it's small enough in the viewport that you don't need that much detail, the browser can stop loading the image at some point where loading more of it wouldn't improve the visual quality. Did I somehow make this up or is this a real feature? I haven't heard about it since
2024-09-15 06:21:30
Or maybe it was something they *could* do utilizing progressive decoding, without it being a real feature?
jonnyawsom3
2024-09-15 06:24:51
You probably saw this https://discord.com/channels/794206087879852103/794206170445119489/1283547974936432760 Which points to this discussion https://discord.com/channels/794206087879852103/794206170445119489/1279372576946393180 Just an idea based on the progressive loading for now, although there should actually be a few ways to do it
BabylonAS
2024-09-15 06:26:48
I knew it had to be spoken somewhere on this very server
AccessViolation_
2024-09-15 06:27:37
I only just joined and I remember thinking about this like... a year ago or more, so I might've come up with it and misremembered it as a feature, or read it on some blog post then
2024-09-15 06:28:16
Cool to hear that it's possible tho!
jonnyawsom3
2024-09-15 06:29:24
I wouldn't be surprised if people come up with the idea on their own, makes sense once you've thought of it, but the issue is figuring out who's meant to decide. Does the server cut off the connection, the client, should libjxl have an option to stop at x progressive scan? Lots of possibilities
AccessViolation_
2024-09-15 06:33:03
For sure. The nice thing tho is that none of this needs to be standardized. I think all you'd need is a stream per image (which HTTP/3 and the older TCP hack, I think HTTP/2.1? should support) and progressive decode until you get the desired quality per display pixel and terminate the stream. The server should then stop sending data. I don't think this would break spec
2024-09-15 08:40:44
I'm planning to send this in the Pop!_OS dev channel, thought I'd ask here if there is anything I should bring up/change/rephrase: > It may be a bit early to bring this up, but making a DE from scratch might make for a good opportunity to introduce the JPEG XL image format into the desktop. It's an image format which outperforms all other image formats in terms of image quality and compression ratio (in both the lossless and lossy modes), and many other factors. > > It's FOSS, patent free, and currently supported in popular creative software like Darktable, Adobe and Affinity's products, etc, but is seeing slower adoption on the web and in generic software like image viewers and other platforms that deal with images. macOS already has full support. I think it would be nice if COSMIC got support as well. > > Google Research is currently working on a decoder implementation in Rust which could be added to libcosmic/iced [COSMIC's GUI toolkit] in the future, and it's already possible to make desktop software relying on the C++ libjxl reference decoder. Personally I think a JPEG XL export option in the screenshot tool would be a great start, and it would be nice to see it in the future COSMIC image viewer as well. > > What do you think?
2024-09-15 08:55:46
I sent it and I'll see what happens. JXL support in COSMIC would be awesome
jonnyawsom3
2024-09-15 09:56:56
There is also <#1065165415598272582> already, if you don't mind a slight speed hit
monad
AccessViolation_ I don't know where I read this, but I remember something about a feature where when a very high resolution image is to be displayed on a webpage, but it's small enough in the viewport that you don't need that much detail, the browser can stop loading the image at some point where loading more of it wouldn't improve the visual quality. Did I somehow make this up or is this a real feature? I haven't heard about it since
2024-09-16 12:49:09
There's the classic FLIF demonstration. <https://flif.info/example.html>
Quackdoc
AccessViolation_ I'm planning to send this in the Pop!_OS dev channel, thought I'd ask here if there is anything I should bring up/change/rephrase: > It may be a bit early to bring this up, but making a DE from scratch might make for a good opportunity to introduce the JPEG XL image format into the desktop. It's an image format which outperforms all other image formats in terms of image quality and compression ratio (in both the lossless and lossy modes), and many other factors. > > It's FOSS, patent free, and currently supported in popular creative software like Darktable, Adobe and Affinity's products, etc, but is seeing slower adoption on the web and in generic software like image viewers and other platforms that deal with images. macOS already has full support. I think it would be nice if COSMIC got support as well. > > Google Research is currently working on a decoder implementation in Rust which could be added to libcosmic/iced [COSMIC's GUI toolkit] in the future, and it's already possible to make desktop software relying on the C++ libjxl reference decoder. Personally I think a JPEG XL export option in the screenshot tool would be a great start, and it would be nice to see it in the future COSMIC image viewer as well. > > What do you think?
2024-09-16 06:03:45
no need
2024-09-16 06:03:52
cosmic is hard limited by image-rs
2024-09-16 06:04:18
I plan on making a PR to either add zune, or jxl-oxide directly when I find the time
2024-09-16 06:08:54
unfortunately image-rs has made some... odd decisions which means it will be a while until its supported. I might be making a new image abstraction crate in the future if my hands can hold up, but its seeming unlikely
2024-09-16 06:10:15
as for libcosmic, it doesn't make sense to to add it to that since it's rust and rust crates already exist
AccessViolation_
2024-09-16 07:49:49
Yeh I meant adding it to the relevant part, like image-rs, not necessarily directly adding it to libcosmic/iced itself
Quackdoc
2024-09-16 08:03:14
image-rs has declined it
CrushedAsian255
2024-09-16 08:03:29
Did they state a reason?
Quackdoc
2024-09-16 08:04:30
because the spec is paywalled, which I don't understand why that would be a blocker myself but it is what it is.
2024-09-16 08:05:52
in the end, image-rs has become non viable for all sorts of programs which is a shame and in general a massive setback to rust
CrushedAsian255
2024-09-16 08:06:14
Can’t you just link JXL oxide?
Quackdoc
2024-09-16 08:06:54
if you want to maintain your own fork
CrushedAsian255
2024-09-16 08:07:14
Fork that
Quackdoc
2024-09-16 08:07:49
in the end, unless you fork it, and convince people to use your fork, it won't actually change the ecosystem
2024-09-16 08:08:27
it would be nice if zune-image would support more formats, I did ask about it, but didn't get a hard answer
_wb_
2024-09-16 09:27:09
The AVIF spec references MIAF (ISO/IEC 23000-22) 14 times and HEIF (ISO/IEC 23008-12) 16 times, in normative ways so you really need to get these other specs if you want to be able to implement AVIF from spec. It also references ISOBMFF (ISO/IEC 14496-12) directly and indirectly via MIAF and HEIF. MIAF costs CHF 151, HEIF is CHF 216, ISOBMFF is CHF 216. For JPEG XL you need ISO/IEC 18181-1 (CHF 216) and ISO/IEC 18181-2 (CHF 96). Total cost for AVIF: CHF 583 Total cost for JPEG XL: CHF 312
BabylonAS
2024-09-16 09:31:38
CHF?
CrushedAsian255
2024-09-16 09:31:47
Why do you need -2 as well if you are decoding just the code stream
_wb_
2024-09-16 09:31:48
Swiss Francs
CrushedAsian255
BabylonAS CHF?
2024-09-16 09:31:55
Swiss money
BabylonAS
2024-09-16 09:32:03
huh
_wb_
2024-09-16 09:32:27
You don't really need -2 as long as you can figure out where the codestream is
CrushedAsian255
2024-09-16 09:32:33
-2 is just the container right?
_wb_
2024-09-16 09:33:06
Yep, and the jpeg bitstream reconstruction procedure
CrushedAsian255
2024-09-16 09:33:13
Also isn’t JPEG XL container ISOBMFF
2024-09-16 09:33:35
Or is all the required normative stuff redefined
_wb_
2024-09-16 09:33:43
It uses the same syntax but it is described in a self-contained way in -2
2024-09-16 09:34:03
We only have normative references to stuff that is not behind a paywall
2024-09-16 09:34:19
Well, except for JUMBF
CrushedAsian255
2024-09-16 09:34:19
Like the brotli spec?
2024-09-16 09:34:36
What’s JUMBF again?
_wb_
2024-09-16 09:34:49
Yes, brotli, ICC, and some ITU specs.
2024-09-16 09:35:19
The extensible metadata thing that you can use for 360 images and C2PA and stuff
CrushedAsian255
2024-09-16 09:35:37
So just metadata?
2024-09-16 09:35:46
Not required for image decoding?
_wb_
2024-09-16 09:36:04
But you can just treat it as a blob, it is not needed if you just want to decode an image
2024-09-16 09:38:44
Anyway, I think all paywalls are bad and I wish we could have the jxl spec without paywall, but it's out of our control, it is high level ISO policy.
2024-09-16 09:40:46
What I don't understand is why AVIF chose to base itself on a paywalled spec, they could have done what we did in 18181-2 and define things in a self-contained way in the spec they can make publicly available.
CrushedAsian255
2024-09-16 09:44:08
Or use their own spec
_wb_ Anyway, I think all paywalls are bad and I wish we could have the jxl spec without paywall, but it's out of our control, it is high level ISO policy.
2024-09-16 09:44:29
Once I fully understand the format I might write my own version of the spec
2024-09-16 09:44:53
Or we can all help contributor to <@206628065147748352> ‘s experimental spec
2024-09-16 09:45:30
Definitely think if there is an independent specification it will help ease people about the whole paywalled spec thing
HCrikki
2024-09-16 12:19:20
cosmic can patch the image-rs in their repository/desktop. reportedly it already works fine, its just image-rs that declined to make the integration upstream. upstreaming is only *ideal*, not required. and adding a single extra patch does not make your builds forks or derivatives
Quackdoc
2024-09-16 12:20:29
yeah, but that still requires you to maintain your own fork
2024-09-16 12:22:21
I was going to finish the work that <@736666062879195236> had done, but I'm not about to maintain a forked version of image-rs
2024-09-16 12:22:42
that being said, if anyone wanted to rebasing the existing patches is easy
2024-09-16 12:24:49
realistically, we need a different abstraction crate anyways for any application that want's to take itself seriously because the policy is going to break lots of workflows, but when you rely on a project that have maintainers that create a new rule and arbitrarily enforce them, it's risky at best
AccessViolation_
2024-09-16 01:21:08
The whole of libcosmic is basically a reimplementation of iced, with many non-cosmic-specific changes upstreamed. Based on that, I don't think they'd mind maintaining a fork or reimplementation of image-rs to add JXL support if they do want it, but that's just a guess
Tirr
2024-09-16 02:34:48
progressive rendering animation of https://discord.com/channels/794206087879852103/1065165415598272582/1099994961530851409 (d1 e6 progressive_dc=1). this should be HDR, but youtube seems to take long time to fully process HDR videos... https://www.youtube.com/watch?v=q0Pg2BvupbU
2024-09-16 02:45:10
total image size is ~3.1 MiB (3250045 bytes). - at 0.15% it shows... something - 16x downsampled image is shown at 3% (~100kB), followed by 8x downsampled image in groups of 2048x2048 until ~14% - HF data of the sky part of the image loads very quickly, since it's mostly smooth gradients and there's not much data to decode
Quackdoc
AccessViolation_ The whole of libcosmic is basically a reimplementation of iced, with many non-cosmic-specific changes upstreamed. Based on that, I don't think they'd mind maintaining a fork or reimplementation of image-rs to add JXL support if they do want it, but that's just a guess
2024-09-16 02:45:31
possibly, but that would be on them
jadamas
2024-09-16 02:45:51
hola
2024-09-16 02:46:46
necesito una imagen de phineas y ferds trabajando camaras de seguridad CCTV por favor
_wb_
2024-09-16 02:47:52
?
afed
Tirr progressive rendering animation of https://discord.com/channels/794206087879852103/1065165415598272582/1099994961530851409 (d1 e6 progressive_dc=1). this should be HDR, but youtube seems to take long time to fully process HDR videos... https://www.youtube.com/watch?v=q0Pg2BvupbU
2024-09-16 02:49:10
what if `group_order=1`
Tirr
2024-09-16 02:51:40
started encoding video, wait a minute
afed what if `group_order=1`
2024-09-16 03:02:53
https://www.youtube.com/watch?v=mgt4C0NDja8
2024-09-16 03:04:16
filesize is 3250331 bytes, mostly the same except the image is loaded center-first
_wb_
2024-09-16 03:04:36
probably doesn't make a huge difference here since the sky is 'easy'
afed
2024-09-16 03:07:24
yeah, not that noticeable, also looks like hdr tonemapping is different
Tirr
2024-09-16 03:08:10
my code is moving quickly, maybe something has changed between two encodes
AccessViolation_
2024-09-16 03:13:10
It says `group_order=1` is center first, is that the one where it spirals outward like the example from the whitepaper? It doesn't look like that's what it's doing, it looks more like a scanline pattern on three columns, doing the center one first, but it's hard to tell
jonnyawsom3
Tirr total image size is ~3.1 MiB (3250045 bytes). - at 0.15% it shows... something - 16x downsampled image is shown at 3% (~100kB), followed by 8x downsampled image in groups of 2048x2048 until ~14% - HF data of the sky part of the image loads very quickly, since it's mostly smooth gradients and there's not much data to decode
2024-09-16 03:14:06
Very much a nit-pick, but maybe you could put info like that in the descriptions in future, for easy reference while watching
AccessViolation_ It says `group_order=1` is center first, is that the one where it spirals outward like the example from the whitepaper? It doesn't look like that's what it's doing, it looks more like a scanline pattern on three columns, doing the center one first, but it's hard to tell
2024-09-16 03:22:58
That's because of the 'easy' sky, it spirals but the upper half is done almost instantly so you only notice it on the way down
AccessViolation_
2024-09-16 03:24:10
Ah, makes sense
jonnyawsom3
2024-09-16 03:31:24
Wonder if the bytes/frame could change when it hits decode thresholds instead of just based on time, then regardless of the file the timing is consistent for loading the LF and HF groups, ect (Probably still a power of 2)
Tirr progressive rendering animation of https://discord.com/channels/794206087879852103/1065165415598272582/1099994961530851409 (d1 e6 progressive_dc=1). this should be HDR, but youtube seems to take long time to fully process HDR videos... https://www.youtube.com/watch?v=q0Pg2BvupbU
2024-09-16 03:36:00
Oh, also any reason for using e6?
_wb_
2024-09-16 03:45:56
<@384009621519597581> (I know Monad beat me to it but I didn't scroll for nothing :P)
Tirr
Oh, also any reason for using e6?
2024-09-16 03:47:55
I wasn't sure about what features each effort level enables, and I just wanted to avoid patches
afed
2024-09-16 03:48:47
`patches=0`
AccessViolation_
2024-09-16 03:49:27
Lossy doesn't do patches yet does it?
monad
2024-09-16 03:49:40
patches are enabled on e5-e9 on small images, e10
2024-09-16 03:50:31
but surely there's no chance of catching anything here
afed
2024-09-16 03:51:46
for lossless, but not for lossy
monad
AccessViolation_ Lossy doesn't do patches yet does it?
2024-09-16 03:52:44
lossy does use patches
2024-09-16 03:53:08
but okay, my efforts may be off, I am always in lossless space ..
afed
2024-09-16 03:54:13
also need some corrections for new changed efforts https://github.com/libjxl/libjxl/blob/main/doc/encode_effort.md
jonnyawsom3
Tirr I wasn't sure about what features each effort level enables, and I just wanted to avoid patches
2024-09-16 03:58:25
Since 0.10, now only effort 10 uses patches
AccessViolation_
monad lossy does use patches
2024-09-16 04:00:57
Ah, I read on the subreddit that the encoder doesn't do lossy patches yet but that was probably outdated then
jonnyawsom3
afed also need some corrections for new changed efforts https://github.com/libjxl/libjxl/blob/main/doc/encode_effort.md
2024-09-16 04:02:35
Streamed encoding has to be disabled to use them, so they were moved to effort 10 for both lossy and lossless unless explicitly enabled
2024-09-16 04:03:03
Same reason progressive is currently broken without the DC command
monad
2024-09-16 04:06:56
as long as both x and y image dimensions are less than 2048px, it will search for patches
jonnyawsom3
2024-09-16 04:18:23
Huh, so you're right... Now I'm wondering why it never showed in my tests considering I'm on a 1080p monitor and 90% of my images are screenshots
2024-09-16 04:19:54
So the old effort chart, but above 4 MP it's effort 10 only
monad
2024-09-16 04:22:37
Not even 4 MP, just one image dimension needs to be >2048px.
jonnyawsom3
2024-09-16 04:23:49
Right
AccessViolation_
2024-09-16 04:44:12
Thinking about this: > As a rule, AVIF takes the lead in low-fidelity, high-appeal compression while JPEG XL excels in medium to high fidelity. It’s unclear to what extent that’s an inherent property of the two image formats, and to what extent it’s a matter of encoder engineering focus. In any case, they’re both miles ahead of the old JPEG. - <https://cloudinary.com/blog/time_for_next_gen_codecs_to_dethrone_jpeg#compression> I've also noticed AVIF beating JXL at quality per byte at very high compression levels, and I'm wondering if at this point we know more about it. Like, do we know whether AVIF having an edge is because of inherent properties of the image format or is this something JXL could technically address with changes to how the encoder behaves in these situations?
jonnyawsom3
2024-09-16 04:48:18
JXL could definitely be improved, right now the 256 VarDCT blocks aren't used, only going up to 32 or 64 if I recall. But in the end AVIF was made for video running at extremely low quality and fast paced images, so I think it will always have an edge. The question is if you actually want that outside of... Well, a video
AccessViolation_
2024-09-16 04:52:23
It's definitely not something you'd want *that bad* as a feature since even if you'd be looking at a slightly less shitty image, it's still a shitty image, so fair point. Better is still better, though. I wonder if this could maybe improve the looks of the early results of progressive decoding? I don't know how it works exactly but progressive decoding to me looks more like it works in a downscaling kind of way rather than a quality setting kind of way, but idk
lonjil
2024-09-16 04:52:29
<@532010383041363969> has made the point that the entropy coding used in AV1 doesn't scale well to larger values. Higher quality images will inherently have larger numbers that need to be entropy coded, so AV1 has an inherent disadvantage.
HCrikki
2024-09-16 04:53:50
its not some issue with the format, just how the reference library as released by the libjxl authors works - preserving more detail at all resolutions. one could do a rebuild that discards some detail for lower visual quality
lonjil
2024-09-16 04:54:20
Presumably, the converse ought to be true, if AV1 was well designed then the sacrifice to have poor entropy coding for larger values would make the entropy coding of smaller values a bit more efficient. Though, my mathematical intuitition tells me that it would only be a small advantage.
_wb_
2024-09-16 04:58:20
There are definitely some inherent aspects of av1 codec design vs jxl codec design that cause av1 to be more suitable for low quality and jxl more suitable for high quality. Entropy coding, granularity of filter strength signaling, aggressiveness of filters, expressivity of jxl patches vs avif palette blocks, and the hard limit of 12-bit precision in av1 are some of the things that point in that direction.
2024-09-16 04:59:55
But there's also a difference in encoder dev focus, where basically most jxl devs don't really care about d>4 while most av1 devs don't really care about d<4, so to speak 🙂
A homosapien
2024-09-16 06:22:20
Not to mention there are even improvements to be made for high fidelity images as well. I remember there was a discussion where vibrant colors became a little desaturated, and blue hues being shifted around. Idk if these issues were fixed in the new release. Regardless, there is *a lot* of untapped potential still left in jxl. That's for sure.
AccessViolation_
2024-09-16 06:28:45
Thanks for the clarifications :)
2024-09-16 06:30:31
> Regardless, there is a lot of untapped potential still left in jxl Personally I'm really curious what benefits from splines we will see in the future
2024-09-16 06:35:54
When I see an image like this I can only think "how well could splines help compressing this fur"
A homosapien
2024-09-16 06:38:10
Same with manga/line art, I can only imagine the insane gains for such images.
jonnyawsom3
spider-mario it was indeed for features such as hair that are harder to encode with DCT; I’ll see if I can dig up the hand-crafted examples later but they seemed relatively promising
2024-09-16 06:44:39
https://discord.com/channels/794206087879852103/794206170445119489/1217510081155956758 (Link to image)
AccessViolation_ > Regardless, there is a lot of untapped potential still left in jxl Personally I'm really curious what benefits from splines we will see in the future
2024-09-16 06:44:45
Took a while, but I found both a discussion around it and an example
AccessViolation_
2024-09-16 06:55:21
Woah that's really good actually
2024-09-16 06:55:33
I know it's hand crafted, but still
Fox Wizard
2024-09-16 07:01:49
Wonder if spline will mess up the fur on that image
_wb_
2024-09-16 07:04:59
my gut feeling is that for the fur, splines are not that useful (regular DCT will be ok), and splines are more useful for those whiskers
spider-mario
2024-09-16 07:05:12
I was going to say pretty much the same thing
_wb_
2024-09-16 07:06:09
funny how we have gut feelings about the effectiveness of coding tools we don't even have an encoder for yet 🙂
spider-mario
2024-09-16 07:06:34
I did act as a human encoder for one of them, though 😁
_wb_
2024-09-16 07:07:32
I guess I kind of did too, in some jxl art 🙂
AccessViolation_
2024-09-16 07:09:00
From the discussion, I don't really understand what's actually supposed to happen to the pixel data under the spline. My initial idea was that the splines would be fitted first, then pixels it's supposed to replace are removed or deemed unimportant to the lossy compressor thing somehow, so they're 'ignored' and I imagine would be replaced by the average color of nearby pixels/continuing patterns, but it but I'm seeing different ideas.
2024-09-16 07:09:59
Forgive me using phrases like "lossy compressor thing", I'm still learning :)
jonnyawsom3
2024-09-16 07:12:14
I trust the experts will pitch in soon, but if I recall it's done in a way that whatever gets predicted under the spline is left alone with no residuals/entropy, and then the Spline is layerd on top using the prediction as it's base color, ect (So the spline encodes the differences but with a perfect edge) As soon as I hit send ;P
spider-mario
2024-09-16 07:12:41
splines in jxl are additive by default (in XYB space), so the spline would be subtracted from the original pixels
2024-09-16 07:13:10
it’s also possible to make them occlusive but it gets potentially tricky when you consider the semi-transparent edges
afed
2024-09-16 07:13:16
i think splines are useful for any sharp edges on some flat area where artifacts are especially visible
jonnyawsom3
2024-09-16 07:14:50
Thin objects like hair, whiskers, wires, ect but also possibly the edges of all objects to hide the blur or VarDCT for example
AccessViolation_
2024-09-16 07:17:17
> if I recall it's done in a way that whatever gets predicted under the spline is left alone with no residuals/entropy So if I understand this would effectively be "predict what would go here but unlike what you would normally, don't store bytes telling it it's wrong if it is", which makes sense to me as a data saving mechanism. But I don't understand how this: > splines in jxl are additive by default (in XYB space), so the spline would be subtracted from the original pixels results in less data being used in the underlying pixel layer. Unless I'm misunderstanding what subtraction means in this context
2024-09-16 07:19:03
Since splines aren't occlusive in this case you would still need those pixels there, even if you subtract a different value from them, no?
spider-mario
2024-09-16 07:23:23
if the pixel would normally be (242, 128, 137), and the value added by the spline would be (125, 0, 0) (it’s a red line on a grey-ish background), you would encode (117, 128, 137) with VarDCT (which would encode well if the spline is accurate and the subtraction leaves behind pretty much just the background)
afed
2024-09-16 07:24:15
for fur and hair, like for each individual hair, I think it would highly overcomplicate the decoding and detection because it would need a lot of splines, but for the most prominent or boundary ones it could make a significant improvement
2024-09-16 07:28:05
sort of like cel-shading methods, but on top, without texture flattening <https://en.wikipedia.org/wiki/Cel_shading>
_wb_
spider-mario it’s also possible to make them occlusive but it gets potentially tricky when you consider the semi-transparent edges
2024-09-16 07:30:54
I don't think we have a way to do "kReplace"-like splines, only "kAdd", right?
spider-mario
2024-09-16 07:32:35
from what I recall, we had an alpha blending option, or did we remove it?
2024-09-16 07:32:46
now that I think about it, we might have
_wb_
2024-09-16 07:33:27
splines only operate on XYB (or whatever the main color channels are)
spider-mario
2024-09-16 07:34:19
``` Thu May 16 16:53:59 2019 +0200 Remove alpha splines, leaving only additive splines ```
2024-09-16 07:34:23
RIP alpha splines
_wb_
2024-09-16 07:34:24
patches have different blend modes, and so do frames, so I guess you could put a spline on a patch frame and then put that on your frame with any blend mode you like
2024-09-16 07:34:54
though you can't make the alpha channel follow the shape of the spline
spider-mario
2024-09-16 07:35:49
```diff - // Nothing is currently done to "subtract" alpha-blended splines from the - // encoded image. - if (!add && spline.mode == Spline::RenderingMode::kAlphaBlending) continue; ```
2024-09-16 07:37:01
I don’t quite remember all the details of how they worked but I do remember that they were much slower to render
_wb_
2024-09-16 07:37:31
yeah I can imagine that
2024-09-16 07:38:18
also seems even harder to make an encoder for that
spider-mario
2024-09-16 07:38:51
I see a commit message with the following numbers: ``` - decompression speed without any spline: ~180MP/s (unchanged) - decompression speed with three kAdditive splines: ~160MP/s - decompression speed with three kAlphaBlending splines: ~60MP/s ```
_wb_
2024-09-16 07:39:03
(just like we only have an encoder for kAdd patches at the moment, we're not using any of the other blend modes)
AccessViolation_
spider-mario if the pixel would normally be (242, 128, 137), and the value added by the spline would be (125, 0, 0) (it’s a red line on a grey-ish background), you would encode (117, 128, 137) with VarDCT (which would encode well if the spline is accurate and the subtraction leaves behind pretty much just the background)
2024-09-16 07:51:01
is it like you're basically trying to remove the element that's going to become the spline by continuing the surrounding background color and then making the spline the color you would need to recreate it when you apply the operation, effectively removing the object, giving VarDCT a more homogeneous area leading to better data savings ? I thought about your message but I don't get how it's able to achieve anything without also looking at the colors surrounding the spline rather than the spline candidate pixels themselves
spider-mario
2024-09-16 07:51:50
yes, ideally, the spline should be such that what remains after subtraction is smooth and easy to encode
2024-09-16 07:51:59
but we currently don’t have an algorithm that actually achieves that
AccessViolation_
2024-09-16 07:52:15
Ahh, okay, then I understand. Thanks!
2024-09-16 07:52:22
Interesting stuff
spider-mario
2024-09-16 07:52:29
so I basically constructed the spline data by hand for the example above
2024-09-16 07:52:46
“let’s see… if I make the spline go through these points, it’s almost there, but it needs to be slightly thicker here”
AccessViolation_
2024-09-16 07:53:35
I saw, it looked pretty impressive. How long did it take you?
spider-mario
2024-09-16 07:53:49
I don’t remember exactly, it was in 2019
2024-09-16 07:54:19
to think that 2019 was 5 years ago :S
_wb_
2024-09-16 07:54:20
we should look at some of the old vectorization algorithms or something. it's a tricky problem but shouldn't be unsolvable
AccessViolation_
2024-09-16 08:00:55
just some edge detection, and maybe adobe will let you borrow their content aware fill for the pixels under the splines <:PepeOK:805388754545934396>
2024-09-16 08:01:17
ez
jonnyawsom3
2024-09-16 08:03:16
Considering what Adobe did with the JXL DNGs, I'm not sure I'd trust them with it Dx
lonjil
spider-mario ``` Thu May 16 16:53:59 2019 +0200 Remove alpha splines, leaving only additive splines ```
2024-09-16 08:03:26
dang that's crazy, I've been saying on this server for ages that alpha blended splines would've been great
2024-09-16 08:04:54
the solid part of the spline (if it exists) would've had an alpha of 1.0, fully occluding underlying pixels so that the underlying pixels could be unconstrained, and the semi-transparent edges could've done a nice blending with the surrounding pixels
AccessViolation_
Considering what Adobe did with the JXL DNGs, I'm not sure I'd trust them with it Dx
2024-09-16 08:05:45
I was very much joking, but what did they do? 👀
jonnyawsom3
2024-09-16 08:08:13
Instead of using any of the features JXL has to offer, they just swapped out the old jpeg for it and that was it. So it still uses TIFF tiles, at tile sizes that don't match group sizes, without using the ColorFilterArray extra channel specifically meant for RAW files and don't get me started on the DNGConveter command line tool...
2024-09-16 08:09:42
They set the lossy quality so low that the lossless files are actually smaller, set `faster_decoding=4` even though it has some *issues* with lossless that bloats filesize massively, although I think they doged that bullet by not letting it compress properly in the first place thanks to the tiles
AccessViolation_
2024-09-16 08:10:31
<:galaxybrain:821831336372338729>
jonnyawsom3
2024-09-16 08:10:34
Jyrki says they know some people at Adobe and that they must have a method to the madness, but it just seems insane to me
AccessViolation_
2024-09-16 08:13:12
And presumably this is all set in stone for 1.7 already
lonjil
2024-09-16 08:13:42
I think DNG is pretty flexible in how stuff is stored
_wb_
2024-09-16 08:14:15
yes, I think you could use it in a way that is basically just a wrapper around a jxl codestream plus some obligatory DNG metadata
lonjil
2024-09-16 08:14:19
Certainly the libjxl encoding setting choices are not in the spec at all, and can be chosen arbitrarily by the encoder.
AccessViolation_
2024-09-16 08:16:52
If those two digital camera models that shoot DNG 1.7 found out about this they would be very upset
_wb_
2024-09-16 08:17:05
AccessViolation_
2024-09-16 08:17:11
Oh, and iPhone 16, which is pretty huge
_wb_
2024-09-16 08:17:17
2024-09-16 08:17:43
they did put some fields in DNG 1.7 which are pretty specific, kind of assuming that libjxl is the encoder 🙂
lonjil
2024-09-16 08:17:56
that's funny
_wb_
2024-09-16 08:19:16
I mean, I guess another jxl encoder could also use these fields to give some indication of what it is doing
lonjil
AccessViolation_ Oh, and iPhone 16, which is pretty huge
2024-09-16 08:19:38
There was some code in iOS 18 relating to non-debayered and tiled jxl in dng, but I think that for DNGs produced by Apple's own camera app, they will store single 3-channel images of debayered data. And hopefully not with weird tiling. (does anyone know what kind of tiling ProRaw files use today?)
_wb_
2024-09-16 08:20:14
I bet they're going to do tiling, for HEIC they use 512x512 tiles
2024-09-16 08:20:56
for current ProRaw it doesn't really matter since they use lossless, but for lossy it's kind of a bad idea to use DNG-level tiles
2024-09-16 08:22:02
not that it will matter much if they're going to use very high quality settings, which seems to be the case if those estimated file sizes are correct
lonjil
2024-09-16 08:22:15
But HEIC tiling must be to make more efficient use of the hardware encoder and decoder. I mean I wouldn't be surprised if they did tiling for JXL ProRaw, but I don't think it is influenced by the same factors as HEIC. Unless they're planning to HW accelerate it in the future?
_wb_
2024-09-16 08:23:01
I'll be curious to see the first iPhone 16 Pro bitstream samples
lonjil
_wb_ not that it will matter much if they're going to use very high quality settings, which seems to be the case if those estimated file sizes are correct
2024-09-16 08:23:03
yeah, I mean that looks like d0.1 or something. I couldn't even tell the difference between lossless and d0.1 when zoomed in when I tried it.
_wb_
2024-09-16 08:23:54
I think it's probably a good idea if you want to make the concept of "lossy raw" digestible for "pro" photographers, to use an insanely high quality like that
AccessViolation_
2024-09-16 08:24:56
Are there even reasonable savings with such a low distance?
_wb_
2024-09-16 08:25:29
well it should still be smaller than lossless
2024-09-16 08:25:46
but not by a huge factor
2024-09-16 08:26:41
it would have been nice if they would expose some control over it — so you can do d0.1 - d1 or whatever
lonjil
AccessViolation_ Are there even reasonable savings with such a low distance?
2024-09-16 08:27:47
For my photographs (not necessarily accurate for Apple's 12-bit debayered images) ``` 1.1G uncompressed 672M jpegll 385M jxld0e1 186M jxld0.1e1 170M jxl-tiny-d0.1 ```
AccessViolation_
2024-09-16 08:28:03
If it shaves off 1% I don't think it'd be worth thinking about, if it shaves off 10% (like maybe being able to sacrifice the tiniest amount of data *at all* suddenly opens the door to way better compression at still being effectively-but-not-really lossless) I could see it being worthwhile
lonjil For my photographs (not necessarily accurate for Apple's 12-bit debayered images) ``` 1.1G uncompressed 672M jpegll 385M jxld0e1 186M jxld0.1e1 170M jxl-tiny-d0.1 ```
2024-09-16 08:28:38
damn, okay
2024-09-16 08:28:41
nvm then XD
lonjil
lonjil yeah, I mean that looks like d0.1 or something. I couldn't even tell the difference between lossless and d0.1 when zoomed in when I tried it.
2024-09-16 08:28:43
ok, at 8x zoom looking around for a while, I found a single pixel that changed enough for me to notice
AccessViolation_
2024-09-16 08:32:24
I've only recently been dealing with raw files. I heard somewhere that even 20 year old raw files can be developed into really nice wide gamut HDR images today, which was motivation to sacrifice a bit of storage and make my phone capture RAW as well for particularly nice photos I'm taking
lonjil
2024-09-16 08:33:09
My DSLR is I think more than 15 years old at this point and it blows any phone camera out of the water
2024-09-16 08:33:19
though I'm not very good at the mastering stuff
AccessViolation_
2024-09-16 08:34:48
I've been toying with Darktable but I haven't been able to really match my phone's default JPEG output in appeal. Phones do a crazy lot of computational photography especially in dark scenes, so that's sort of expected. At least until I get better at doing this
spider-mario
2024-09-16 08:35:18
the “Local contrast” module can probably help
2024-09-16 08:36:12
it’s been a while since I’ve used darktable, but from what I remember, what I liked to do was temporarily set it to an absurdly large intensity, just to help me find the radius that works best for the image
2024-09-16 08:36:17
then dial it back
2024-09-16 08:36:43
(it’s still the strategy I use in DaVinci Resolve)
AccessViolation_
2024-09-16 08:37:19
Thanks, I'll try that. I've mostly been using saturation, exposure and denoise to make things look "okay"
2024-09-16 08:38:39
I'm probably doing something wrong in the saturation department because RAW images come out of my phone almost looking sepia and I need to get that slider pretty close to max for it to look alright
spider-mario
2024-09-16 08:38:41
ah, it seems the “local laplacian filter” mode doesn’t _have_ a radius setting
2024-09-16 08:38:48
I guess I used the “bilateral grid” mode then
AccessViolation_
2024-09-16 08:48:37
I'll probably try following a starter guide tomorrow
spider-mario
2024-09-16 08:50:27
at the time, it was also quite a good thing to understand the filmic module, but not sure how it is now
lonjil
2024-09-16 08:51:06
someone sent me an iOS 15 12MP ProRaw file, this is what exiftool reports: ``` 322 TileWidth : 504 323 TileLength : 504 ```
2024-09-16 08:51:52
So if they just use JXL as a drop in replacement, I guess those would be the tile dimensions they'd use.
CrushedAsian255
2024-09-16 08:58:50
it's better than Adobe
2024-09-16 08:59:16
a 256x256, 256x248, 248x256, 248x248
2024-09-16 08:59:29
their HEVC decoder probably works in 512x512 blocks already
_wb_
2024-09-16 09:01:25
why 504 and not 512
CrushedAsian255
2024-09-16 09:03:03
probably to prevent edge artifacting
2024-09-16 09:03:20
it might give the endoder 8x8 of the previous tile so it can use it to predict off of
2024-09-16 09:03:31
wait no that doesno't work
2024-09-16 09:03:44
someone should check the block sizes of heic image
AccessViolation_
2024-09-16 09:07:49
Maybe the full height and width are a multiple of 504, making it so you only have full tiles and not small slices of tiles in the bottom and right side of the image
CrushedAsian255
2024-09-16 09:08:18
Probably
2024-09-16 09:08:30
4032/504 = 8.0
2024-09-16 09:08:51
3024/504 = 6.0
jonnyawsom3
CrushedAsian255 it's better than Adobe
2024-09-16 11:51:51
`672x752` https://discord.com/channels/794206087879852103/822105409312653333/1252968284702380083
Tirr
Tirr https://www.youtube.com/watch?v=mgt4C0NDja8
2024-09-17 06:42:36
this is now available as hdr
2024-09-17 06:44:56
seems to work well, the sun is brighter than the light-mode youtube background
CrushedAsian255
2024-09-17 07:00:49
Confirmed; hdr works on iPhone
2024-09-17 07:01:40
Can’t comment on the progressive loading though; my vision makes it so 10% looks fully loaded
2024-09-17 07:01:47
Also could be YT compression
Tirr
2024-09-17 07:02:44
I can see HF progressive with my macbook air so it's ok
Arcane
2024-09-17 10:27:19
CrushedAsian255
2024-09-17 10:27:27
Oops sorry misclick
jonnyawsom3
AccessViolation_ It says `group_order=1` is center first, is that the one where it spirals outward like the example from the whitepaper? It doesn't look like that's what it's doing, it looks more like a scanline pattern on three columns, doing the center one first, but it's hard to tell
2024-09-17 10:42:15
I see what you mean now, and I think I know why. That happens during the DC/LF part of the image, which are 2048 blocks instead of 256. So it probably is a spiral, but there's only 3 blocks so it just goes middle, left, right (probably)
2024-09-17 11:59:26
I do wonder if centre-first should be the default group order. Set it apart from the other formats and usually the focus is in the centre anyway
Traneptora
2024-09-17 11:59:45
wouldn't be too hard to do with permuted TOC
jonnyawsom3
2024-09-17 12:10:09
It's already an option in cjxl, but scanline is default currently
AccessViolation_
2024-09-17 12:33:42
Hmm, I wonder if there's some simple heuristic to determine a better custom group order without using the fancy machine learning model Google proposed. For example, if you have a picture of an ocean underneath a clear sky, people will probably look at the horizon first. So in that sense you want high coarse contrast areas decoded first. You'd also want to avoid being tricked into decoding larger noisy areas, like if there's a rough concrete wall it would go "oh, contrasty features, probably important" which it shouldn't do. So It should however care about the section where the wall ends or changes color as that's an eye catching feature. So to summarize: large, coarse color/luminance differences? Just a theory
CrushedAsian255
2024-09-17 12:34:25
Maybe areas with lots of defined edges?
AccessViolation_
2024-09-17 12:36:27
Simple repeating patterns of stripes would throw that method off, similarly to how an otherwise boring feature that happens to be pretty noisy like concrete would throw off a simple contrast approach. Maybe you really just want to look at the very low frequency data for this. You can then maybe do some edge detection on *that*
CrushedAsian255
2024-09-17 12:37:28
What about faces?
2024-09-17 12:37:34
That’s usually where people look first
AccessViolation_
2024-09-17 12:40:19
For faces Google's approach already knows to care about the eyes and mouth first. You can probs also do simple face detection (which would probably also count as ML) as a special pass on top of our edge detection / coarse contrasts detection
jonnyawsom3
2024-09-17 12:40:37
Feels like something to do along with spline encoding, since that would be edge detection and borders of objects too
CrushedAsian255
AccessViolation_ For faces Google's approach already knows to care about the eyes and mouth first. You can probs also do simple face detection (which would probably also count as ML) as a special pass on top of our edge detection / coarse contrasts detection
2024-09-17 12:41:24
A really simple ml model that only activates at higher efforts shouldn’t be a problem
AccessViolation_
2024-09-17 12:42:01
Without a special module for face detection, then if the face is large like with a portrait it would probably care about the edges, where the neck meets the shirt, and the eyes/mouth, would probably be what happens
CrushedAsian255
2024-09-17 12:42:32
Eye/mouth would probably be fine tbh
AccessViolation_
2024-09-17 12:52:42
Actual image, image prepared for feature detection, detected features
2024-09-17 12:53:18
Something like this maybe?
2024-09-17 12:53:41
The idea is that tiles containing the red splines would get decoding priority
2024-09-17 12:56:36
By operating on a blurred/averaged version you eliminate the problem of someone with a striped shirt being prioritized because there's a lot of contrast there, for example
CrushedAsian255
2024-09-17 12:57:31
Does this still detect faces?
AccessViolation_
2024-09-17 12:58:45
No special logic for faces, no. Face detection could be a pass that happens on the unblurred image can also be added to the list of features
2024-09-17 01:00:36
I'm not sure if splines like these are the best method for representing features, if you used a low resolution matrix containing the priority for every location in the image then you wouldn't have to use splines for everything, since splines are really only good if you care about rough edges, not faces or other features
2024-09-17 01:02:04
And with that, specialized encoders could decide themselves what they care about. E.g one for encoding comics could detect and prioritize text