|
|
dds
|
|
Jyrki Alakuijala
what is your recommendation now with the spec? for our team libjxl 1.0 and wasm are the highest priority
|
|
2022-11-08 03:46:41
|
|
|
2022-11-08 03:46:42
|
I'm actually writing a critique of JXL at the moment. Wrt what to actually do about anything - there's obviously a difference between "here's what I'd do now if I could change anything" and "here's what I'd do now given that the bitstream is officially frozen".
|
|
2022-11-08 03:46:55
|
Off the top of my head, some of the things I'd be asking of the good people developing JXL are:
|
|
2022-11-08 03:47:23
|
1) Ask some of your security engineers to review the spec before it's too late - and act on their findings. They're good at finding inconsistencies and DoSs. Similarly ask the developers of the (two?) other implementations of JXL to give you some 'gloves off' feedback on the spec.
|
|
2022-11-08 03:47:44
|
To this end, add clarity wherever possible and use RFC 2119-style imperatives like 'MUST', 'MUST NOT' in order to define valid / invalid bitstreams. For instance in G.2.4 I'd like to know whether BlockInfo consists of rows at a time or columns at a time (apologies if this is defined elsewhere) and would like to read something like "The remaining samples of a varblock (i.e. those other than the top left) MAY be any value and decoders are required to ignore them" - this avoids people writing code that makes assumptions. Another example: in the definition of splines, add something like "Decoders MUST reject any sequence of control points that contains any two consecutive points with the same value" - this saves implementors needing to realise that they have a divide-by-zero to avoid.
|
|
2022-11-08 03:48:00
|
2) Every feature of JXL must be implemented in the reference encoder - not necessarily optimally, but completely. For instance I believe that many large and irregular DCT sizes are currently turned off. Have these been demonstrated to be useful for anything beyond constant colour rectangles? I'm assuming that the answer is 'yes'; if so then enable them, at least behind a flag. If not then have the difficult conversation about changing the spec.
|
|
2022-11-08 03:48:29
|
2b) Following 2): seriously consider removing splines from the spec and changing the 1<<4 bit of the FrameHeader flags to be Reserved. There's a huge amount of complexity
in JXL already; for this feature the spec doesn't match libjxl and still contains bugs (see recent conversations). I have been keeping an open mind about splines and have asked <@604964375924834314> for the test images that justify the feature, but these haven't yet arisen. JXL art alone does not justify the feature, and the fact that splines are for "future-proofing and not for immediate use" is a red flag for me. They're also a good way to DoS decoders (recall I posted a spline bomb a while ago). Even if - generously - the feature saves 20% of the compressed size of 5% of images, is it worth it? I claim not; you may disagree. Following _wb_'s comment above about ijg-libjpeg only implementing some of the JPEG spec, I'd rather jump than be pushed here.
|
|
2022-11-08 03:48:40
|
2c) When I last looked, the value of the MA leaf_node.offset was _always_ zero, the point being that ANS encoding is sufficiently good that there's nothing to gain by helping it out. So this looks like a redundant field to me; if it weren't politically difficult to change the bitstream I'd remove it.
|
|
2022-11-08 03:48:47
|
2d) I'm not a fan of modular_16_bit_buffers either but I'll choose my battles...
|
|
2022-11-08 03:49:12
|
3) For the love of god document AFV. With apologies for the strong language, having a secret algorithm within a formal specification is unforgivable. Writing out the details isn't as bad as you think - since the characteristic polynomial of the Laplacian factors into components of degree <= 4, there are relatively concise expressions for the algebraic numbers comprising the pre-normalised matrix. The only remaining issue is that there is a single value relating to the matrix of the remaining 3 pixels for which none of the developers could recall the derivation. I suggest choosing something sensible and mathemetically exact. The other (and more practical) reason to document AFV better is that a full description opens the doors for efficient implementations. You don't have to do a O(N^3) 16x16 matrix multiply - honest.
|
|
2022-11-08 03:49:37
|
4) For me the lack of predictable decoding speed is a concern for web adoption. There are two sides to this. Firstly, as a result of modular mode being fairly flexible, malicious JXL files can effectively DoS your CPU, which isn't good for the format's reputation. (I know some steps are being taken to address the 'spline bomb' I mentioned)
|
|
2022-11-08 03:49:52
|
The second angle is this: If you compress a PNG file with "--best --try_everything" (or whatever), there is basically nothing you can do that will cause decode speed to increase pathologically. By comparison, I imagine someone will at some point write a 'JXL optimiser' that shaves 1% off the size of lossless JXL images at the expense of a much more computationally expensive decode. The person or website distributing the compressed images may not be malicious but may not care about your power consumption. As a user I'm going to blame my browser or phone for having a rubbish battery life instead of the person who created the file.
|
|
2022-11-08 03:50:27
|
So I recommend reworking the "Profiles and Levels" annex to count the total synthetic amount of 'work' per unit pixel, rather than specifying individual maxima for each setting like it does at the moment. Then during the decode process, a decoder SHOULD bail out if it thinks it has done too much work. Also ditch any limits on width / height, as a decoder can read the header and make an early decision on whether or not to decode. Like before you probably only need two profiles: one for the web, and one for pretty much unrestricted. The advantages of this are 1) it amortises decoding speed so it doesn't matter if some areas of the image that are relatively slow to decode as long as the rest is fast enough to compensate, 2) you can provide approximate guarantees of decoding speed (or rather, decoding power consumption) to anyone who cares about performance, 3) changing this Annex can be done without changing the bitstream.
|
|
|
_wb_
|
2022-11-08 06:46:44
|
MUST and MUST NOT are a no-go in ISO standards, where "must" has the specific meaning of something that is basically a consequence of the laws of physics or logic (so it basically a form of asserts that only contain redundant already-implied information).
|
|
2022-11-08 06:49:46
|
Also we are not supposed to say decoders "reject" a bitstream because then we need to specify what that means, which we don't want to do (it can show a warning or error, refuse to decode, crash or decode to something unspecified, and each of those except crash could be useful depending on the use case)
|
|
2022-11-08 06:55:18
|
I agree that there may be cases where the spec can be clearer on what is invalid, but the way it is done is not by saying "the bitstream is invalid if x < 0" but by simply saying "it is the case that x >= 0", which is a bit confusing but this is the way you do it in ISO. If we would say "x must be >= 0" then that would be an informative statement saying that there just is no way in which x could be < 0 (which would be a wrong thing to put in the spec if the bitstream syntax allows x < 0 but then it's not a valid bitstream).
|
|
2022-11-08 06:59:15
|
2a) we will possibly define a profile at some point that doesn't allow large block sizes; in particular for hardware implementations in fixed point this would make sense. For software implementations the overhead of implementing large blocks in the decoder is quite low (the idct can be implemented recursively so impact on binary size is not large)
|
|
|
|
dds
|
|
_wb_
I agree that there may be cases where the spec can be clearer on what is invalid, but the way it is done is not by saying "the bitstream is invalid if x < 0" but by simply saying "it is the case that x >= 0", which is a bit confusing but this is the way you do it in ISO. If we would say "x must be >= 0" then that would be an informative statement saying that there just is no way in which x could be < 0 (which would be a wrong thing to put in the spec if the bitstream syntax allows x < 0 but then it's not a valid bitstream).
|
|
2022-11-08 07:08:58
|
obviously not your fault but it's horrifying that they do things like that
|
|
|
_wb_
|
2022-11-08 07:09:38
|
2b) splines would also not be allowed in a "hardware-friendly" profile. DoS attacks should already be mitigated by suitably limiting the total pixel area of splines in the Levels definition — something we already have in libjxl but still need to add to the spec. Other than DoS potential, I think the complexity cost is low enough to be justified by compression potential, which I think can be significantly more than 20% for images that are authored in a way that directly uses splines (say when jxl is used as an export format for a vector graphics editor), but I agree that in the normal case of encoding from pixels, its usefulness is somewhat unclear. What do we gain by deleting it at this point though, considering it's already implemented?
|
|
|
|
dds
|
|
_wb_
2b) splines would also not be allowed in a "hardware-friendly" profile. DoS attacks should already be mitigated by suitably limiting the total pixel area of splines in the Levels definition — something we already have in libjxl but still need to add to the spec. Other than DoS potential, I think the complexity cost is low enough to be justified by compression potential, which I think can be significantly more than 20% for images that are authored in a way that directly uses splines (say when jxl is used as an export format for a vector graphics editor), but I agree that in the normal case of encoding from pixels, its usefulness is somewhat unclear. What do we gain by deleting it at this point though, considering it's already implemented?
|
|
2022-11-08 07:23:13
|
I argue that what you gain by erasing it is simplicity, feature portability, less vulnerability to images with non-predictable decode time / power consumption, less risk of major implementors skipping over that part of the spec and making the feature de facto redundant. Multiple profiles create potential portability issues out of the gate - even two is bad enough and needs a decent management plan. If some features are supported on PCs but not phones, you could end up with people saying things like "Oh some JXLs don't work on my phone, just download the JPEG instead" which is undesirable.
For direct export from vector graphics there are existing formats that do the job better; if JXL wants to supplant those formats as well then I think it needs more vector features.
|
|
|
_wb_
|
2022-11-08 07:25:05
|
2c) the offset is always zero (except of course in jxlart) because iirc the encoder isn't even trying to find offsets and multipliers. Multipliers are used as a way to do quantization. It's a good question to what extent offsets are useful for compression, but if they're all zeroes the signalling cost is fixed and very low, and I imagine non-zero offsets could help a bit for certain kinds of synthetic images. <@179701849576833024> do you remember if we collected evidence about offsets being useful?
|
|
2022-11-08 07:29:03
|
You make good points about splines but I think it's a bit late at this point to remove them from the spec; I don't think any software implementation will implement only the hw profile, and phones will just use the software decoder — the hw profile would only be for things like motion jxl in cameras where hw encode/decode would be desirable.
|
|
2022-11-08 07:29:57
|
2d) could you elaborate what you don't like about modular_16_bit?
|
|
|
|
veluca
|
2022-11-08 07:30:45
|
Re offsets, I don't think we had specific examples
|
|
|
|
dds
|
|
_wb_
2d) could you elaborate what you don't like about modular_16_bit?
|
|
2022-11-08 07:36:10
|
Sure, it's from a conversation between the two of us from a while ago - basically if you want to write a 100% robust decoder then you need to worry about the modular_16_bit field lying, i.e. saying that everything fits where it doesn't. So you have to do everything in higher precision anyway, making the field redundant. I don't like the idea that someone can send a corrupt jxl to a remote website and have it (unknowingly) render back an image with integer overflows. With my adversarial hat on, this could give you things like a CPU feature leak from a JXL file (as different SSE2 / AVX2 implementations could give you different corruption).
|
|
2022-11-08 07:37:02
|
yes it's a minor objection on the grand scale of things
|
|
|
_wb_
|
2022-11-08 07:39:02
|
4) what's the incentive on the web to deliberately use slowly decoding images? If someone wants to make a page that will make a browser choke, there are many ways to do that. I think this is a bigger concern for image intake (e.g. UGC) than for image delivery. It is a concern that decode speed is not predictable, but at least it is bounded, and most image intake has to deal with worse things already (like svg, pdf).
I agree that level 5 needs to be defined in terms of capping work per pixel (which it already mostly is, only for splines and patches it still needs to be reformulated, I think).
|
|
|
dds
Sure, it's from a conversation between the two of us from a while ago - basically if you want to write a 100% robust decoder then you need to worry about the modular_16_bit field lying, i.e. saying that everything fits where it doesn't. So you have to do everything in higher precision anyway, making the field redundant. I don't like the idea that someone can send a corrupt jxl to a remote website and have it (unknowingly) render back an image with integer overflows. With my adversarial hat on, this could give you things like a CPU feature leak from a JXL file (as different SSE2 / AVX2 implementations could give you different corruption).
|
|
2022-11-08 07:43:00
|
That's a good point, and probably means a browser implementation should probably not do such an optimization, but the field could still be useful for use cases where adverserial bitstreams are no concern or there is no problem of cpu feature leak, say an offline image viewer.
|
|
|
lonjil
|
2022-11-08 07:44:13
|
In practice that sounds like something no one will do. Because the default needs to be secure in all situations, and almost no one will bother to change the default even when it is safe.
|
|
|
_wb_
|
2022-11-08 07:44:50
|
Thanks for all of this great feedback! Very valuable — I wish we had it 2 years ago, but most of it we can still act on now, I think.
|
|
|
|
dds
|
|
_wb_
That's a good point, and probably means a browser implementation should probably not do such an optimization, but the field could still be useful for use cases where adverserial bitstreams are no concern or there is no problem of cpu feature leak, say an offline image viewer.
|
|
2022-11-08 07:46:46
|
This is where bad things start though - e.g. your point is only true if you document the fact enough for the browser team and other implementors to notice (or they work it out themselves). Browsers are the 90% case (complete guess) for JXL use, so for me, writing them off as customers of that particular feature says that it doesn't really have a purpose.
|
|
|
_wb_
|
|
lonjil
In practice that sounds like something no one will do. Because the default needs to be secure in all situations, and almost no one will bother to change the default even when it is safe.
|
|
2022-11-08 07:47:46
|
I think some memory-tight use cases may bother with it (if it makes the difference for OOM), but I agree most wouldn't bother, e.g. libjxl doesn't even have a 16-bit modular path and just ignores the field.
|
|
|
|
dds
|
|
_wb_
Thanks for all of this great feedback! Very valuable — I wish we had it 2 years ago, but most of it we can still act on now, I think.
|
|
2022-11-08 07:48:22
|
np - sorry for any general lack of tact and I also wish I had noticed the existence of JXL two years ago. Ironically I was a big fan of FLIF and didn't even hear about FUIF until JXL came along.
|
|
|
_wb_
|
2022-11-08 07:51:32
|
If we would add it to libjxl at some point, we should definitely disable the 16-bit path by default unless we can implement it in a way that doesn't leak cpu info (which might be possible, after all it's integer arithmetic only so getting overflow behavior consistent across platforms / simd versions could be possible). I don't think the spec is the right place to describe such quality of implementation/security concerns (but maybe adding a note wouldn't hurt)
|
|
|
|
dds
|
|
_wb_
2a) we will possibly define a profile at some point that doesn't allow large block sizes; in particular for hardware implementations in fixed point this would make sense. For software implementations the overhead of implementing large blocks in the decoder is quite low (the idct can be implemented recursively so impact on binary size is not large)
|
|
2022-11-08 07:52:03
|
To clarify 2a) - it's not that I have an objection to unimplemented features per se; instead I have an objection to redundant or non-useful features, and I like 'spec coverage' the same way I like 'code coverage'. So I'm challenging the dev team on whether large or very unequal block sizes are useful. I'm guessing that the answer is probably 'yes they are', in which case adding them to the encoder (even behind a flag) gets this fact out in the open and increases spec coverage.
|
|
|
_wb_
|
|
dds
np - sorry for any general lack of tact and I also wish I had noticed the existence of JXL two years ago. Ironically I was a big fan of FLIF and didn't even hear about FUIF until JXL came along.
|
|
2022-11-08 07:53:06
|
I don't mind lack of tact at all, good feedback is way more important to me than the politeness of phrasing it.
|
|
2022-11-08 08:00:06
|
For the large block sizes, there is a diminishing return (every extra doubling of max block size gives smaller and smaller compression gains, especially for the quality range that is important in practice), but there is something to be said for having some 'headroom' there by setting the max equal to the group size (so this is a 'natural' max value). If image dimensions keep growing (hard to tell if that will be the case for much longer, but it could), then the usefulness of the bigger blocks will increase as the intrinsic entropy per pixel goes down (in both photographic and nonphoto cases).
|
|
2022-11-08 08:06:31
|
As far as I understand, the current encode heuristics in libjxl operate at the 64x64 level which is convenient because that's the chroma from luma signaling resolution, but it limits the max block size it can use to 64x64. I agree that it would be useful to make some changes there, even if it's only to detect some special cases, say the case where all HF coeffs are zero.
|
|
|
|
dds
|
|
_wb_
If we would add it to libjxl at some point, we should definitely disable the 16-bit path by default unless we can implement it in a way that doesn't leak cpu info (which might be possible, after all it's integer arithmetic only so getting overflow behavior consistent across platforms / simd versions could be possible). I don't think the spec is the right place to describe such quality of implementation/security concerns (but maybe adding a note wouldn't hurt)
|
|
2022-11-08 08:40:22
|
would it be theoretically possible / acceptable to solve this in the spec without changing the bitstream by replacing it with a 'Reserved' field that must be False? This won't affect any hypothetical decoder that does take action based on the flag as no bad thing comes from modular_16_buffers=False.
|
|
|
_wb_
|
2022-11-08 08:54:29
|
well that would invalidate all the current bitstreams that have it set to true, which is most of them
|
|
|
|
veluca
|
2022-11-09 12:19:21
|
doesn't *timing* already leak cpu info anyway? I am not sure what you'd get out of overflow that you wouldn't get out of code just running faster if you have AVX2 vs SSE4
|
|
2022-11-09 12:19:38
|
and pretty much every single existing image format will do that
|
|
|
|
dds
|
2022-11-09 07:59:05
|
There's a huge difference between side-channels and having real data - both in difficulty and accuracy.
|
|
|
|
veluca
|
2022-11-09 08:35:08
|
I'd agree with you in principle, but:
- AVX2 is *a lot* faster (~2x), not just ~1.1x, so it should be very easy to find out
- for that matter, you could probably just run a simple benchmark of the CPU with wasm or the like and probably figure out the model (intel vs amd, generation) with reasonable accuracy
- even without 16-bit-modular-buffer-safe, you could do stuff that overflows 32-bit arithmetic and then you run in the same problem
|
|
|
|
dds
|
2022-11-09 08:43:00
|
if you can overflow 32 bits then that's another problem with the spec!
|
|
2022-11-09 08:43:24
|
this was the point I caveated under "I'll choose my battles" - for me it's not the biggest deal in the world as I find it more ugly than a genuine security issue. as i said you have to ignore the field anyway if you want to write a robust decoder.
|
|
2022-11-09 08:51:02
|
Actually this isn't a problem: the spec says "In any case, signed 32-bit integers suffice to store decoded samples in modular image sub-bitstreams, as well as the results of inverse modular transforms."
|
|
|
|
veluca
|
2022-11-09 08:51:08
|
I don't think you do: if the encoder lied to you, it's a bug in the encoder, the file is not valid, and the decoder can do whatever it wants - the same argument could be used to say that you have to ignore profiles and levels fields...
|
|
|
|
dds
|
2022-11-09 08:51:09
|
which (I think) means "If a sample of a modular sub-bitstream (before or after an inverse modular transform) contains a sample than cannot be represented by a 32-bit signed integer then the bitstream is invalid"
|
|
|
_wb_
|
2022-11-09 08:52:09
|
you can probably also derive cpu info from conforming vardct files since exact decoded pixel values can be different dependent on cpu architecture
|
|
2022-11-09 08:53:05
|
if int overflow happens, the bitstream is invalid, yes
|
|
|
|
dds
|
2022-11-09 08:58:02
|
the CPU info thing is not really the point here. the complaint isn't "you shouldn't be able to derive any information that gives you info about the CPU", it's "the spec does not take sufficient ownership of the problem of keeping calculations in bounds, and here is one of the potential consequences"
|
|
|
Wolfbeast
|
|
_wb_
You make good points about splines but I think it's a bit late at this point to remove them from the spec; I don't think any software implementation will implement only the hw profile, and phones will just use the software decoder — the hw profile would only be for things like motion jxl in cameras where hw encode/decode would be desirable.
|
|
2022-11-27 06:48:28
|
To just add my $0.02 here, I think splines should be removed from the spec too - it feels completely unrelated to the primary scope of the JXL format as it belongs in a vector format which JXL by definition is not. Having been at the implementation end of vector based rendering implementations (for SVG in our browser) I'm also acutely aware of the large security attack surface this kind of feature brings with it. I don't think it's "too late" to remove it from the spec, either, but it does require a decision to be made soon.
|
|
|
_wb_
|
2022-11-27 06:52:22
|
SVG is way worse security-wise though.
|
|
2022-11-27 06:54:41
|
JXL splines are pretty simple, the only concern is having to paint lots of pixel area (making decoding unexpectedly slow, which could be a DoS issue). For that reason we do need to make sure we define a reasonable limit on the total pixel area of splines (in proportion to the image area)
|
|
2022-11-27 06:55:58
|
The limits we currently have in libjxl may be a bit tight imo
|
|
2022-11-27 06:56:36
|
https://jpegxl.info/art/ the "Rubin golden vase 1" image doesn't decode anymore because of the limits we put in libjxl
|
|
|
Wolfbeast
|
2022-11-27 07:56:35
|
Oh I know SVG is different because it's a full blown XML format. However, there's still a lot to be said about extending a raster format with vector capabilities; I really think it doesn't belong.
|
|
2022-11-27 07:58:35
|
And you have to think a lot about e.g. OOB coordinates, mathematical edge cases (eg. too tight or inverted curves, risk of overflow or division by zero, etc.) and yeah also the DoS risk of simply flooding the drawing routine with huge areas to paint.
|
|
2022-11-27 07:59:55
|
It's tempting to make a new format try and do everything, but it's better to keep the scope limited and just do that really really well.
|
|
|
|
veluca
|
2022-11-27 08:37:10
|
to be fair, there's *lots* of other places to do OOB shenanigans and potential edge cases in JXL (but also in any modern format)
|
|
2022-11-27 08:37:24
|
I don't think splines add much danger from that pov
|
|