JPEG XL

2022-08-30 05:31:44	for cjxl proper, that's a major blocker
2022-08-30 05:32:55	anyway, for the original image: that particular code concerns with HF context modelling, and probably triggered by a higher effort which tries to mix LF coefficients into the context.
2022-08-30 05:33:32	well gist does support pretty large file, so probably yes
2022-08-30 05:34:03	if libraries like this https://github.com/mackron/dr_libs/blob/master/dr_flac.h work, then j40 would surely work
2022-08-30 05:34:35	(dr_flac.h by the way is to my knowledge the largest single-file library I know of, without a help of amalgamation)
2022-08-30 05:50:54	ah, yes, gist shows multiple diffs in one page and that overwhelms its server
2022-08-30 05:51:19	wait for, say, 1--20 seconds and try again, it will show up
2022-08-30 05:51:39	(it seems that new diffs gets eventually cached)
2022-08-30 05:52:14	haha
2022-08-30 05:52:38	I'll set up a new repo soon for that reason
2022-08-30 05:52:46	yes, it is just a git repo with a flat structure
2022-08-30 05:52:52	(no directories displayed)
2022-08-30 06:04:14	yeah, just like ordinary github repos
2022-08-31 09:37:55	today in j40: I gave up with tight memory packing for epf and decided to go an easier route

_wb_

2022-08-31 09:55:57

Probably a good idea. I think epf, especially at higher strength settings (more iters), is mostly only useful for low-fidelity encoding, which I think is not really relevant in practice except for testing how low you can go

yurume

2022-08-31 10:00:48	for most convolutional codes I've used alternating rows as a scratch, but this was too complex to even design in epf
2022-08-31 10:01:53	the easier route is to compute intermediate bitmaps at once, which needs more memory to compute but is much easier to design and understand

veluca

2022-08-31 11:47:47	yeah I strongly suggest you do it that way, at least to begin with
2022-08-31 11:48:05	solving the "minimize buffers" problem for jxl's filters is... complicated
2022-08-31 11:48:35	see https://github.com/libjxl/libjxl/blob/main/lib/jxl/render_pipeline/low_memory_render_pipeline.cc

_wb_

2022-08-31 12:15:53

first make something that works, then optimize — or even keep that for a separate fork, because I think there's also value in having a simpler-but-slower implementation

yurume

2022-08-31 12:20:31

one of the reason I tried to do row-wise computation was that there is an early exit condition (sigma < 0.3) which is better suited for row-wise mode. unfortunately row-wise mode was not suitable for everything else 😉

Traneptora

	yurume today in j40: Hayasaka helpfully reported that every image from JPEG transcoding will crash j40, and it all boils down to the fact that I didn't think u(32) was that frequently used 😉
2022-08-31 04:38:24	I thought I caught this earlier

yurume

2022-08-31 11:24:30

is it the same bug? I remember you have observed a bug while going through libjxl test cases and I'm yet to check that (mainly because I have tons of unimplemented features anyway)

_wb_

2022-09-01 05:22:13

That would be kind of premature imo, when there are still many known limitations

The_Decryptor

2022-09-01 07:29:07

<@268284145820631040> https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f#file-j40-h-L464 I think that's supposed to be `J40__INTN_MIN`

yurume

	The_Decryptor <@268284145820631040> https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f#file-j40-h-L464 I think that's supposed to be `J40__INTN_MIN`
2022-09-01 07:32:10	oh, thank you! I think I would have to separately test fallbacks (I do have tests for them, but in my environment fallbacks aren't enabled)

The_Decryptor

2022-09-01 07:37:06

MSVC considered it an error, made it easy to find 😉

yurume

2022-09-01 08:40:17	okay, initial epf implementation done (not yet tested). another 300 lines of code.
2022-09-01 02:20:23	yes, I'll publish a public repo this month (projected, if my plan goes not too bad)
2022-09-01 02:21:38	my current milestone is to be able to decompress most images compressed by the default cjxl options, thus no progressive nor animation, but YCbCr would be required.
2022-09-01 04:08:55	today in j40: epf is not in place, but I've implemented RAW quantization table
2022-09-01 04:09:25	and I've got a scary seagull... which image for some reason I can't upload and I'm actually scared 😮
2022-09-01 04:10:12
2022-09-01 04:10:23	restarting discord solved the problem
2022-09-01 04:10:55	obviously this is wrong, but probably only because it's YCbCr reinterpreted as XYB
2022-09-01 04:13:41	for epf support I have an important question: which part of the image does restoration filters and (possibly) image features operate on? group? LF group? entire image?

_wb_

2022-09-01 04:37:25	entire image
2022-09-01 04:38:26	it was a considerable pain to make sure gaborish and epf work across group boundaries

yurume

2022-09-01 04:38:30	oh that's unexpected
2022-09-01 04:39:04	since, for example, sharpness information is clearly in the LF group

_wb_

2022-09-01 04:40:32

yeah the signaling is split in groups, but it's important that the filter gets applied across group boundaries

yurume

2022-09-01 04:41:05	that was another reason I initially went for row-wise epf implementation
2022-09-01 04:43:25	my current implementation can use up to 36 times the input memory size, which is very concerning if it has to operate on the entire image

_wb_

2022-09-01 04:49:39

yeah it's kind of a pain, I'm sure <@179701849576833024> can describe the pain in more detail 🙂

veluca

2022-09-01 04:50:44

it is *a lot* of pain, indeed

_wb_

2022-09-01 04:50:52

you can do it in a memory-efficient way but it requires stuff like saving strips of pixels around the group boundaries, as far as I understand

yurume

2022-09-01 04:52:13

I'm okay as long as the memory usage is proportional to width, not width *times* height

_wb_

2022-09-01 04:52:22

the whole point of epf is to remove ringing and blockiness artifacts, including the macro-blockiness that could originate from using 256x256 groups

veluca

2022-09-01 04:52:29

the libjxl version uses something like 13 * (256+16) * 3 floats of extra memory in the worst case for EPF IIRC (well, plus O(image * 16 / 256) memory for buffering group borders, but that *can* be reduced in most cases in *theory*)

yurume

2022-09-01 04:53:29

is the group splitting (even in epf) mainly for parallel decoding then?

veluca

2022-09-01 04:53:38

if you do not care too much about the order in which you actually decode groups, you can decode one row of groups, then EPF the top 243 or so rows, then decode the next row of groups and EPF the next 256 rows, ...

_wb_

2022-09-01 04:54:04

I suppose if you don't care about producing incremental epf-processed results during decode, [what veluca just said]

veluca

2022-09-01 04:54:40

yeah doing it in parallel makes everything *a lot more* painful

yurume

2022-09-01 04:54:51

ah, good point, yeah epf can take much time

veluca

2022-09-01 04:55:00

as well as designing things such that you use O(group size * threads) memory

_wb_

2022-09-01 04:55:26

are you aiming at parallel decode or is single-threaded OK?

yurume

2022-09-01 04:56:17

I did design for a possibility to parallel decoding in the future, but my design only reflects what I knew at that time, which is short of restoration filters

veluca

2022-09-01 04:56:30

by the way, why would you need 36 times the input memory size?

_wb_

2022-09-01 04:56:32

for many use cases, single-threaded decode is good enough (since you can do images in parallel instead), and it does simplify things a lot of you don't aim at multithreaded

veluca

2022-09-01 04:57:02

even a naive implementation would only really need ~2x the size, no?

yurume

	veluca by the way, why would you need 36 times the input memory size?
2022-09-01 04:57:24	I'm essentially computing all DistanceStep* results at once and continue, and multiplying by per-channel scale at the very end
2022-09-01 04:58:12	so I have 3 bitmaps per each kernel coordinate pair

veluca

2022-09-01 04:58:22

ah, I see

yurume

2022-09-01 04:58:23	I guess I should make it just a single bitmap (12 total)
2022-09-01 04:58:36	36 bitmaps do seem too many given this information

veluca

2022-09-01 04:59:01	couldn't you just compute the output pixel directly?
2022-09-01 05:00:03	you need to compute `input_pixels` values of weights, then normalize the weights and do a scalar product with the input pixels, no?
2022-09-01 05:00:41	where weights are something like f(weighted L1 norm of pixel-by-pixel differences of two patches)

yurume

2022-09-01 05:01:18	yeah, it should be possible but that greatly complicates an already deeply nested loop
2022-09-01 05:01:31	https://gist.github.com/lifthrasiir/77bd64bd259e553c8c1ea0b45b735eac this is a glimpse of what I wrote for now
2022-09-01 05:04:22	I rewrote that part of code 3 or 4 times, and in hindsight the current code is easier to turn into more efficient row-wise implementation for distances than older versions

veluca

2022-09-01 06:03:44

ah, I see, the way libjxl does it is very very different indeed

_wb_

2022-09-01 06:56:40

Different is good for crosschecking the spec

Fraetor

2022-09-01 10:58:53

I can't remember who was responsible, but I really want to thank whoever added jxl support to GIMP. I was doing some normal image composition earlier, completely unrelated to codecs and whatnot, and dragged in an image. It just worked so well, that it was only later I realised that it was a jpeg xl. This kind of seamless workflow is amazing, and really gives me hope for the future of the format. ❤️

_wb_

Fraetor I can't remember who was responsible, but I really want to thank whoever added jxl support to GIMP. I was doing some normal image composition earlier, completely unrelated to codecs and whatnot, and dragged in an image. It just worked so well, that it was only later I realised that it was a jpeg xl. This kind of seamless workflow is amazing, and really gives me hope for the future of the format. ❤️

2022-09-02 06:28:51

While I am eager to see jxl supported by default in browsers, it is also nice if broader support is available first, so we get less of the "i hate webp! I can do nothing with these files!" sentiment...

Jim

_wb_ While I am eager to see jxl supported by default in browsers, it is also nice if broader support is available first, so we get less of the "i hate webp! I can do nothing with these files!" sentiment...

2022-09-02 08:36:34

Problem is, even with editor/viewer support, many people don't use them. They just expect the os to support it without installing anything. At least be able to see previews in Explorer, open in the image viewer, and see/modify metadata in the right click menu. The most common OS being Windows, but it will take them a LONG time to add it. I am hoping there will be an easy to install plugin that will allow those 3 main functions that I can point people to that become frustrated seeing a JXL for the first time.

_wb_

2022-09-02 08:39:10	I am hoping that Microsoft will be pushed by Intel (who wants HDR to 'just work', considering they're selling HDR-capable graphics chips for a while now and you can't keep selling hardware if software doesn't follow) to make JXL work out of the box asap.
2022-09-02 08:41:10	For Linux, the situation is a bit ambiguous: basic (SDR) JXL support is already available in a few distros out of the box, and I expect more to follow when we do a new libjxl release, but HDR in Linux seems like it's not going to work well in general in the near future.

2022-09-02 08:42:30	hdr in wayland looks like it should just work
2022-09-02 08:42:36	esp. compared to how windows is
2022-09-02 08:44:04	problem is people dont understand that hdr is color management <:calibrate:930792680282259497>

_wb_

2022-09-02 08:45:05

And of course on Mac you never know what Apple is up to until they announce it, but I'm kind of hoping that Adobe will push them to make JXL work (also considering that Apple is ahead in terms of HDR support in hardware but they still don't really have a good way to use it for still images, they seem to be stuck at some kind of semi-HDR thing where they encode a tone-mapped HEIC with some extra channel for HDR, if I understand correctly it's basically RGBE)

improver

2022-09-02 08:45:31

iirc wayland HDR stuff is still very heavily work in progress

_wb_

2022-09-02 08:45:54

yeah just sending raw 10-bit stuff to a monitor might work in wayland and even X, but I mean properly handling the color management so it actually makes sense

improver

2022-09-02 08:46:12	30bit color stuff only started recently on swaywm for me and bigger things like gnome don't do that yet, but for HDR you'd need more than that
2022-09-02 08:46:39	iirc its something something floating point buffers

2022-09-02 08:46:41

yeah, i think the idea is that hdr comes almost free if you have the basic color management done correctly, which wayland has been working on

The_Decryptor

2022-09-02 08:47:23	Pretty sure Microsoft has given up on their "competitor" format JPEG XR, they only use it for HDR stuff because there's no better alternative (And the encoder is a part of the core OS)
2022-09-02 08:47:44	Add a JXL WIC codec and that gets you full pipeline HDR support, as long as devs actually use the provided frameworks

_wb_

	The_Decryptor Pretty sure Microsoft has given up on their "competitor" format JPEG XR, they only use it for HDR stuff because there's no better alternative (And the encoder is a part of the core OS)
2022-09-02 08:49:15	JPEG XR activity is dead inside the JPEG committee, if that's any indication. JPEG 2000 is still active, for comparison (mostly for HTJ2K), and even the old JPEG-1 still has some minimal activity (mostly just updating reference implementations but still).

improver

2022-09-02 08:49:27

https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/14 & https://gitlab.freedesktop.org/wayland/weston/-/issues/467 seems main things to track for wayland HDR r n & other compositors like swaywm will possibly come later & i suspect gnome even later

The_Decryptor

2022-09-02 08:49:37	I got the feeling XR died pretty early on
2022-09-02 08:50:05	I remember at one point Microsoft was working with Mozilla on a implementation for Firefox, and then it just stopped ever being updated

JendaLinda

2022-09-02 08:50:52

Microsoft used to be quite slow with new technologies. After all, it took several years before MS offered 32bit and then 64bit OS to consumers.

_wb_

2022-09-02 08:51:57

The positioning of JXR as being somewhere in between JPEG and JPEG 2000 in terms of both computational complexity and compression to me means that it's basically even more "meh" than JPEG 2000 for widespread adoption, while it lacks most of the rich functionality and existing deployments that keeps J2K alive in niches like cinema and medical.

The_Decryptor

2022-09-02 08:52:35	I've only ever seen JXR used for HDR screenshots with Xbox and Windows and by Nvidia as a replacement for OpenEXR, outside of that it's non-existent
2022-09-02 08:52:52	I've seen more interest in QOI than I've ever seen for JXR 😆

_wb_

2022-09-02 08:53:14

If HDR would have become more mainstream a bit earlier, then I can imagine JXR would have won against WebP/

2022-09-02 08:54:03	hdr is scary
2022-09-02 08:54:13	i dont want to scroll down a page and it suddenly flash bangs me

improver

2022-09-02 08:54:50

i believe u can actually sorta limit it but that depends on config i g

2022-09-02 08:56:52	at the same time i believe there shouldnt be a control to limit it
2022-09-02 08:56:59	since that goes against the whole idea of color being a function of light and it would make all images look WRONG

improver

2022-09-02 08:58:14	i mean some displays just cant go over like 400
2022-09-02 08:58:27	kinda half hdr

_wb_

2022-09-02 08:58:35

I dunno - intensity is rarely absolute anyway, people do want to be able to adjust the dimming of their screen

2022-09-02 08:59:17	so it should be just scaling, and that would also make everything else dimmer
2022-09-02 09:00:29	oh but i guess it doesnt need to be everything else if done correctly

JendaLinda

2022-09-02 09:01:05

Yeah I'm using just a regular monitor but I set it to 60 brightness because it was too bright. I suppose it just regulates the power of LED backlight.

2022-09-02 09:01:17

yeah this wont work with lcd backlight

improver

2022-09-02 09:02:26

my monitor has this calibrated srgb preset that doesn't allow setting brightness

JendaLinda

2022-09-02 09:02:34

I've seen OLEDs are kinda self regulated as the maximum power consumption of the panel is limited.

improver

2022-09-02 09:02:52

its actually not too bright tho so all good

2022-09-02 09:07:54

srgb is 80 nits and i also wonder why there are "srgb" presets in monitors that dont come with sensors

improver

2022-09-02 09:12:30

precalibrating is much cheaper than adding sensors

2022-09-02 09:13:07

displays drift fairly quickly though so it seems like a bait to me

improver

2022-09-02 09:13:55

I'll probably get sensor at some point but so far things look kinda ok so whatever

The_Decryptor

2022-09-02 09:14:48

I got a second hand one, has worked out great

spider-mario

2022-09-02 10:22:39	one with a glass filter, like an i1Display or a ColorMunki Display from X-Rite, is an excellent choice as they will last many years
2022-09-02 10:23:07	whereas the organic filters used by Datacolor’s Spyder5 colorimeters tend to degrade with time
2022-09-02 10:23:33	I don’t remember whether the recent SpyderX changed this
2022-09-02 10:24:57	but it’s still considered sort of “meh” by the author of ArgyllCMS https://www.avsforum.com/threads/datacolor-spyderx.3050976/#post-57785616
2022-09-02 10:25:42	> Conclusion: > > Cheap and cheerful. > It doesn't seem like the i1 Display Pro is at all threatened in a technical sense.

JendaLinda

2022-09-02 10:30:54	Speaking of color profiles, I came across a few pictures encoded using Display P3 color profile. Despite that, I preferred the rendering using sRGB. In Display P3, the colors were too oversaturated. In sRGB, the pictures looked much better.
2022-09-02 10:33:58	Another interesting observations is about LCD screens using CCFL backlight. These screens are prone to yellowing over time. Not sure if it's the tube itself or if it's a degradation of the optical materials. The whole image just shifts to yellow tones.

_wb_

2022-09-02 12:36:18

HDR and color gamut are kind of orthogonal things — you can have grayscale-only HDR, while you can in principle have a very wide color gamut but a dynamic range that is very limited

2022-09-02 12:56:12	i like to think of hdr as "i want it this color this bright physically"
2022-09-02 01:02:20	(just blabber) say you have a "400 nit" monitor, you still need to profile it so your pc actually knows, at what level creates what brightness, (max level being ~400 nits) same with color. Given the information, color management kicks in and creates the correct EOTF - signal to the display to produce the correct brightness/color
2022-09-02 01:07:01	the other option is to send the data straight to the monitor but we know how that ends up (windows hdr mode)

_wb_

2022-09-02 01:12:32

in SDR, you could get away with assuming rgb(1.0,1.0,1.0) is just whatever the brightest white is that the monitor can produce (or wants to produce, if the user is dimming the display) in HDR, the peak brightness of displays can vary a lot, so it doesn't really make sense anymore to encode images without saying how bright things are supposed to be, or to just display them by simply equating the maxval to the max brightness of the display — that would make images created in SDR just completely blinding when shown like that on a HDR display

spider-mario

2022-09-02 01:33:08	with that said, HDR does not necessarily imply absolute luminance, e.g. HLG is relative
2022-09-02 01:33:30	but yeah, (1, 1, 1) is not diffuse white
2022-09-02 01:34:01	this is an excellent document on “what HDR is about”: https://www.itu.int/pub/R-REP-BT.2390-10-2021
2022-09-02 01:34:54	in particular the first section (“Introduction and design goals for HDR television”)

The_Decryptor

	w the other option is to send the data straight to the monitor but we know how that ends up (windows hdr mode)
2022-09-02 01:35:08	That's what it does in SDR mode, in HDR mode it uses BT.2100 and converts everything to that

spider-mario

2022-09-02 01:36:05

> In traditional imaging, the range allocated to these highlights was fairly low and the majority of the > image range was allocated to the diffuse reflective regions of objects. For example, in hardcopy print > the highlights would be 1.1x higher luminance than the diffuse white maximum. In traditional video, > the highlights were generally set to be no higher than 1.25x the diffuse white. Of the various display > applications, cinema allocated the highest range to the highlights, up to 2.7x the diffuse white.

JendaLinda

2022-09-02 02:20:11

One side effect of HDR being more common will be hopefully the end of 6 bit panels.

Jim

2022-09-02 03:45:05

As far as phones, it is what it is. If you have an old phone, it's not going to support newer software. I can't tell you how much of a pain it was when Safari was in-between adding some of the CSS3 features and so many people had old iPhones that didn't support much of it... but there's nothing you can do about it. Just have to try to polyfill for older phones and wait until the majority upgrade.

veluca

2022-09-02 06:37:42

for whoever has Chrome 105+: HDR PNG is now supported https://old.lucaversari.it/hdr.png

_wb_

2022-09-02 07:03:38

Whoa that image looks quite different in various previews/viewers

veluca

2022-09-02 07:03:57

indeed

_wb_

2022-09-02 07:04:04
2022-09-02 07:05:23	Android Chrome 105
2022-09-02 07:06:16	Slack preview:

veluca

2022-09-02 07:07:21	and while I was at it, I modified fpnge: https://github.com/veluca93/fpnge/commit/48020025e197318ab3c695903a4577e3d9565ddb
2022-09-02 07:07:58	now if you call `fpnge -pq some_image_in_pq.png x.png`, then `x.png` will display as HDR

_wb_

2022-09-02 07:08:37

Is Android Chrome 105 supposed to render it correctly? It looks like a bandy mess

veluca

2022-09-02 07:09:05

that's probably the fault of Android's HDR rendering

JendaLinda

2022-09-02 07:09:45

In Firefox, it's a darkness.

veluca

2022-09-02 07:10:13

yeah FF doesn't support HDR

_wb_

2022-09-02 07:10:51

I'm on Android 10, my phone display should be (somewhat) hdr capable but I suppose Android doesn't do things correctly?

veluca

2022-09-02 07:11:21

more precisely, Chrome on android does weird things with HDR

paperboyo

2022-09-02 07:35:01

FWIW, Chrome 105 on Android 13.

_wb_

2022-09-02 07:37:11	nice, so it depends on Android version, not Chrome version
2022-09-02 07:37:25	Chrome Canary 107 on Android 10 also looks like a bandy mess on my phone

fab

2022-09-02 07:46:54

Kiwi browser bandy mess

veluca

2022-09-02 08:15:29	mhhh, it also looks like a bandy mess on an up-to-date pixel 5
2022-09-02 08:15:44	(I'd say in the same way)

improver

2022-09-02 08:50:44

android 12 here & same as what wb is seeing

Jim

2022-09-02 09:26:23

It works on Chrome. Not Firefox. However, on both there is very visible bands. Ouch.

Fraetor

2022-09-02 10:02:57	Which one is the more correct? I assume the dark one is wrong, but given what it does with the XYB image in <#803663417881395200> I thought EoG was better at colour management than the preview.
2022-09-02 10:04:00	Or is this just a case of HDR being broken on linux?

spider-mario

2022-09-02 11:15:45	right, there is presently no HDR on Linux AFAIK
2022-09-02 11:15:58	there is supposed to be some work going on to have it in Wayland
2022-09-02 11:16:10	it is plausible that X11 will never get it
2022-09-02 11:16:58	(not saying that it’s strictly impossible that it will ever get it, but it seems somewhat unlikely)

yurume

2022-09-03 04:11:39	that's to be expected. in fact there is an yet-to-be-explained slowdown in Windows...
2022-09-03 04:17:27	which compiler did you use by the way? J40 doesn't directly use SIMD so its performance greatly depends on compiler's ability to autovectorize.
2022-09-03 04:18:45	in my experience recent enough versions of both gcc and clang are doing great jobs (in fact, I have both compilers in both Windows and WSL and the Windows slowdown appears with both compilers so I think it is not a compiler problem)
2022-09-03 04:34:29	no `-O3` or similar?
2022-09-03 04:35:55	that's very interesting, it should be equivalent to `-O0` and it would be very much slow.

BlueSwordM

	yurume which compiler did you use by the way? J40 doesn't directly use SIMD so its performance greatly depends on compiler's ability to autovectorize.
2022-09-03 04:45:39	Oof.
2022-09-03 04:45:54	Wait wait Jurume.
2022-09-03 04:47:57	You are currently developing J40 in Rust, correct? Do you think you can make a crate that does RGB <> XYB and YCbCr <> XYB transforms with input and output pixel formats? That would actually be quite nice to have, and if you have enough time, you can even add in metadata support since that will be needed.

yurume

2022-09-03 04:48:16	uh, no? it's in C, with a parallel Rust version planned.
2022-09-03 04:48:49	by the way the algorithm itself is not hard to make, I think I can just write it in a snippet
2022-09-03 05:02:38	```rust struct RgbImage { width: usize, height: usize, r: Vec<f32>, g: Vec<f32>, b: Vec<f32> } struct XybImage { width: usize, height: usize, x: Vec<f32>, y: Vec<f32>, b: Vec<f32> } enum ConversionError { TooLarge } // XYB to linear RGB, assumes max intensity of 255 nits fn xyb_to_linear_rgb(xyb: XybImage) -> Result<RgbImage, ConversionError> { const OPSIN_BIAS: f32 = -0.0037930732552754493; const CBRT_OPSIN_BIAS: f32 = -0.15595420054924863; const OPSIN_INV_MAT: [[f32; 3]; 3] = [ [11.031566901960783, -9.866943921568629, -0.16462299647058826], [-3.254147380392157, 4.418770392156863, -0.16462299647058826], [-3.6588512862745097, 2.7129230470588235, 1.9459282392156863], ]; let npixels = xyb.width.checked_mul(xyb.height).ok_or(ConversionError::TooLarge)?; assert!(xyb.x.len() == npixels); assert!(xyb.y.len() == npixels); assert!(xyb.b.len() == npixels); let outr = Vec::<f32>::with_capacity(npixels); let outg = Vec::<f32>::with_capacity(npixels); let outb = Vec::<f32>::with_capacity(npixels); for (x, (y, b)) in xyb.x.into_iter().zip(xyb.y.into_iter().zip(xyb.b.into_iter())) { let l = (y + x) - CBRT_OPSIN_BIAS; let m = (y - x) - CBRT_OPSIN_BIAS; let s = b - CBRT_OPSIN_BIAS; let lmix = l * l * l + OPSIN_BIAS; let mmix = m * m * m + OPSIN_BIAS; let smix = s * s * s + OPSIN_BIAS; outr.push(OPSIN_INV_MAT[0][0] * lmix + OPSIN_INV_MAT[0][1] * mmix + OPSIN_INV_MAT[0][2] * smix); outg.push(OPSIN_INV_MAT[1][0] * lmix + OPSIN_INV_MAT[1][1] * mmix + OPSIN_INV_MAT[1][2] * smix); outb.push(OPSIN_INV_MAT[2][0] * lmix + OPSIN_INV_MAT[2][1] * mmix + OPSIN_INV_MAT[2][2] * smix); } Ok(RgbImage { width: xyb.width, height: xyb.height, r: outr, g: outg, b: outb }) } ```
2022-09-03 05:02:43	something like this? I haven't tested it but the logic should be clear <@321486891079696385>

BlueSwordM

	yurume uh, no? it's in C, with a parallel Rust version planned.
2022-09-03 05:08:20	Ahh OK OK. I though you made your decoder in Rust, so I thought you could make the colorapace conversion code into a crate 😂
2022-09-03 05:08:29	Sorry for bothering you in that regard :)

yurume

2022-09-03 05:08:53

ah, I guess that part of code (if I would make a crate) will be private anyway 😉

veluca

2022-09-03 06:12:06

random guess: that mips cpu has no float support 😉

The_Decryptor

2022-09-03 06:19:56

Looks like it, it apparently uses the MIPS 1004Kc arch, but 1004Kf is the one with float support

yurume

2022-09-03 06:38:29

I'm actually surprised that softfloat can be *that* fast

veluca

2022-09-03 06:42:13

yeah it's not bad at all

JendaLinda

2022-09-03 10:57:39

I'm trying to compile j40 on Debian testing and it doesn't work. What I am doing wrong? I have gcc and build-essential installed. I have both j40.c and j40.h in the directory. gcc -lm j40.c It's still complaining about missing powf, hypotf and cbrtf.

yurume

2022-09-03 12:04:15

perhaps `gcc j40.c -lm` may work, I see no other reason for the error

JendaLinda

2022-09-03 12:21:00

The order matters. It worked. Thank you! I thought it's something simple.

yurume

2022-09-03 12:22:27

yeah, C/C++ link order is something almost magical

improver

2022-09-03 01:14:26	on arch any order always worked for me and on ubuntu it was particularly picky (only satisfying dependencies if you include things asking for symbols first & only including libs satisfying them later)
2022-09-03 01:15:16	i never went deep on figuring out why but it's been like that for years
2022-09-03 01:15:47	maybe ubuntu compile their stuff with different options

diskorduser

2022-09-03 01:16:12

Where is the source code for j40. I cannot find it <:SadCat:805389277247701002>

improver

2022-09-03 01:16:28	on some github gist iirc
2022-09-03 01:16:40	ctrl+f gist i g

diskorduser

2022-09-03 01:19:47	thanks
2022-09-03 01:28:00	-O3 240kb and -O2 168kb

yurume

2022-09-03 01:58:46	huh, I thought I fixed it
2022-09-03 01:59:16	older versions had a wrong stride in the PNG writer, but it looks like another bug
2022-09-03 02:05:45	ah, found it, I forgot to update `j40__combine_modular_from_pass_group` to use the plane interface
2022-09-03 02:06:18	and my working copy did fix that
2022-09-03 02:07:44	so, short answer: replace that function with the following: ```c J40_STATIC void j40__combine_modular_from_pass_group( j40__st st, int32_t num_gm_channels, int32_t gy, int32_t gx, int32_t minshift, int32_t maxshift, const j40__modular_t gm, j40__modular_t m ) { int32_t gcidx, cidx, y, gx0, gy0; (void) st; for (gcidx = num_gm_channels, cidx = 0; gcidx < gm->num_channels; ++gcidx) { j40__plane gc = &gm->channel[gcidx], c = &m->channel[cidx]; J40__ASSERT(gc->type == c->type); if (gc->hshift < 3 \|\| gc->vshift < 3) { size_t pixel_size = (size_t) J40__PLANE_PIXEL_SIZE(gc); size_t gc_stride = (size_t) gc->stride_bytes, c_stride = (size_t) c->stride_bytes; (void) minshift; (void) maxshift; // TODO check minshift/maxshift!!! J40__ASSERT(gc->hshift == c->hshift && gc->vshift == c->vshift); gx0 = gx >> gc->hshift; gy0 = gy >> gc->vshift; J40__ASSERT(gx0 + c->width <= gc->width && gy0 + c->height <= gc->height); for (y = 0; y < c->height; ++y) { memcpy( (void) (gc->pixels + gc_stride * (size_t) (gy0 + y) + pixel_size * (size_t) gx0), (void) (c->pixels + c_stride (size_t) y), pixel_size * (size_t) c->width); } printf("combined channel %d with w=%d h=%d to channel %d with w=%d h=%d gx0=%d gy0=%d\n", cidx, c->width, c->height, gcidx, gc->width, gc->height, gx0, gy0); fflush(stdout); ++cidx; } } J40__ASSERT(cidx == m->num_channels); } ```
2022-09-03 02:08:38	long answer is that I'm in the way of another restructuring so I can't post the working copy to the gist as it won't compile yet 😦
2022-09-03 02:12:44	this time the restructuring is to do with supporting permuted TOC, which requires random access to some extent
2022-09-03 02:13:56	uh, no `gc`? I think it's in the line 2784 in the currently published version.
2022-09-03 02:14:44	replaced the diff with a full code
2022-09-03 02:15:19	and matched a function signature
2022-09-03 02:15:51	...again
2022-09-03 02:16:20	(one of the pending updates is that I avoid using `_t` suffix for types, as it is reserved)
2022-09-03 02:16:59	oh well.
2022-09-03 02:17:03	```c #define J40__PLANE_PIXEL_SIZE(plane) (1 << ((plane)->type & 31)) ```
2022-09-03 02:18:04	that's why I'm yet to publish the full repo, it is still rapidly changing...
2022-09-03 02:19:31	at the top of my task list there has been "implement `j40_advance`" for literally weeks, everything else is its subtask or sub-subtask or something like that
2022-09-03 02:25:36	- I need to implement API instead of a hastily written PNG writer - most API functions depend on `j40_advance` which tries to parse as much as possible or to the certain requested point (e.g. the first frame) - `j40_advance` can run multiple times, so its state should be retained (it is possible to write a coroutine in pure C but it is really annoying to write it by hand, this time though it is easy to maintain so it's gonna be a coroutine at least in `j40_advance` and some key functions) - `j40_advance` internally calls `j40__frame` which should also be a coroutine - `j40__frame` had a long-standing bug that it neglects TOC permutation so I've got to fix this at last - TOC splits a frame codestream into multiple sections which should not overlap; it is best to define a nested context for each section (like libjxl `BitReader`), and there is already a support for that - aaand I realized that it is impossible to jump to a certain codestream offset without a mapping from codestream offset to file offset, which can only constructed by full scanning (aargh) - so I rewrote `j40__container` to construct that mapping (<- right now)
2022-09-03 02:27:09	essentially, yes! everything else would extract some data from its output.

Traneptora

	yurume ```rust struct RgbImage { width: usize, height: usize, r: Vec<f32>, g: Vec<f32>, b: Vec<f32> } struct XybImage { width: usize, height: usize, x: Vec<f32>, y: Vec<f32>, b: Vec<f32> } enum ConversionError { TooLarge } // XYB to linear RGB, assumes max intensity of 255 nits fn xyb_to_linear_rgb(xyb: XybImage) -> Result<RgbImage, ConversionError> { const OPSIN_BIAS: f32 = -0.0037930732552754493; const CBRT_OPSIN_BIAS: f32 = -0.15595420054924863; const OPSIN_INV_MAT: [[f32; 3]; 3] = [ [11.031566901960783, -9.866943921568629, -0.16462299647058826], [-3.254147380392157, 4.418770392156863, -0.16462299647058826], [-3.6588512862745097, 2.7129230470588235, 1.9459282392156863], ]; let npixels = xyb.width.checked_mul(xyb.height).ok_or(ConversionError::TooLarge)?; assert!(xyb.x.len() == npixels); assert!(xyb.y.len() == npixels); assert!(xyb.b.len() == npixels); let outr = Vec::<f32>::with_capacity(npixels); let outg = Vec::<f32>::with_capacity(npixels); let outb = Vec::<f32>::with_capacity(npixels); for (x, (y, b)) in xyb.x.into_iter().zip(xyb.y.into_iter().zip(xyb.b.into_iter())) { let l = (y + x) - CBRT_OPSIN_BIAS; let m = (y - x) - CBRT_OPSIN_BIAS; let s = b - CBRT_OPSIN_BIAS; let lmix = l * l * l + OPSIN_BIAS; let mmix = m * m * m + OPSIN_BIAS; let smix = s * s * s + OPSIN_BIAS; outr.push(OPSIN_INV_MAT[0][0] * lmix + OPSIN_INV_MAT[0][1] * mmix + OPSIN_INV_MAT[0][2] * smix); outg.push(OPSIN_INV_MAT[1][0] * lmix + OPSIN_INV_MAT[1][1] * mmix + OPSIN_INV_MAT[1][2] * smix); outb.push(OPSIN_INV_MAT[2][0] * lmix + OPSIN_INV_MAT[2][1] * mmix + OPSIN_INV_MAT[2][2] * smix); } Ok(RgbImage { width: xyb.width, height: xyb.height, r: outr, g: outg, b: outb }) } ```
2022-09-03 03:08:07	how does this work? I was under the impression that the cube root was taken after the bias, not before, when converting to XYB
2022-09-03 03:08:13	so you should be cubing it and then adding the bias
2022-09-03 03:09:48	I'm looking at CBRT_OPSIN_BIAS specifically

yurume

2022-09-03 03:09:49	I think it's a faithful translation of the following pseudocode in the spec:
2022-09-03 03:10:31	also note that I've moved `- cbrt(...)` into previous statements

Traneptora

2022-09-03 03:10:41	I see, so that's straight up the cube root of the previous things
2022-09-03 03:10:47	this has to be able to be done more cleanly

yurume

2022-09-03 03:11:57

I avoided using `lgamma` etc. for that reason, was my translation not clear enough?

Traneptora

	yurume I avoided using `lgamma` etc. for that reason, was my translation not clear enough?
2022-09-03 03:19:16	no, I meant, there's gotta be some simplification of the above using algebra
2022-09-03 03:22:21	In particular, if you let `B = (Lgamma - A)` then `B^3 + A^3` factors into `(B + A) * (B^2 - BA + A^2)`
2022-09-03 03:23:35	you can then write `(B^2 - BA + A^2) = (Lgamma - A)^2 + (Lgamma - A) * A + A^2 = (Lgamma - A)^2 + A Lgamma`
2022-09-03 03:24:13	and you can write this entire thing as `Lgamma * ((Lgamma - A)^2 + A Lgamma)`
2022-09-03 03:26:02	and `(L gamma - A) ^ 2 + A Lgamma = L gamma ^2 - 2 A Lgamma + A^2 + A Lgamma = Lgamma^2 - A Lgamma + A^2`
2022-09-03 03:26:15	so you end up with `Lgamma * (Lgamma^2 - A Lgamma + A^2)`
2022-09-03 03:26:30	which I suppose you'd get anyway when you just expanded the cubic?
2022-09-03 03:27:06	maybe this doesn't help in this specific case but there has to be some sort of way to bake some of this stuff in

_wb_

2022-09-03 03:27:34

In any case it's just a few muladds

Traneptora

2022-09-03 03:27:39	in particular, if you look at the spec, `itscale` can be baked into the opsin inverse matrix since it's just a scalar multiple
2022-09-03 03:29:27	once you do that, you get `Mix = (something) + Bias`, and so when you multiplye `Opsin * Mix` you can break that into `Opsin * ((something) + Bias)` and get `Opsin * (something) + Opsin * Bias` and `Opsin * Bias` can be baked into it
	_wb_ In any case it's just a few muladds
2022-09-03 03:30:08	sure, but I feel like it could be optimized, especially if this is done in fixed-point

yurume

2022-09-03 03:30:41

yeah, for J40 it doesn't make much sense because it exclusively uses fp, but fixed point may benefit from that

Traneptora

2022-09-03 03:30:53

although it sounds like this particular colorspace conversion is probably not the most computationally intensive part

yurume

2022-09-03 03:31:12	in the case of J40 the most computationally intensive code is linear to sRGB conversion
2022-09-03 03:31:44	which is very slow at the moment, because it just calls powf and I think it's not vectorized

Traneptora

2022-09-03 03:42:50

linear to sRGB involves a branch, right? value^2.4 if the value is bigger than a certain amount, right?

yurume

2022-09-03 03:43:10	yes, but that can be made branchless
2022-09-03 03:43:25	the real problem is that one of those branches is disproportionally expensive

Traneptora

	yurume which is very slow at the moment, because it just calls powf and I think it's not vectorized
2022-09-03 03:43:27	weird question but what if you did `expf(2.4 * logf(value))`
2022-09-03 03:43:55	would that be faster than `powf`

yurume

2022-09-03 03:45:50

it is hard to say, I think most powf implementations internally do that, but range reduction will happen once in powf but twice in expf + logf

veluca

2022-09-03 03:54:05	in libjxl there's some black magic to speed up the sRGB tf (well, specifically x**2.4)
2022-09-03 03:54:20	(also only in the u8 output path on arm)

yurume

2022-09-03 03:54:45

yeah, I think it approximates the whole branch with dedicated code

veluca

2022-09-03 03:55:05	https://github.com/libjxl/libjxl/blob/main/lib/jxl/dec_xyb-inl.h#L104
2022-09-03 03:55:37	ah, sorry, apparently to approximate the other direction
2022-09-03 03:56:35	basically: write x as 2^e * f with f in [0.5, 1), poly approx for `f2.4` and lookup table for `(2^e)2.4`

yurume

2022-09-03 03:57:52

I thought this: https://github.com/libjxl/libjxl/blob/35f187f/lib/jxl/transfer_functions-inl.h#L344

veluca

2022-09-03 04:51:40

hah I didn't remember I wrote that one first

Traneptora

2022-09-03 05:14:58	well if you want to approximate `x^2.4` you could let `f(x) = x^2.4` and let `f'(x) = 2.4 x^1.4`
2022-09-03 05:15:19	then you have `f(x)/f'(x) = x/2.4`
2022-09-03 05:16:56	etc. etc. newton's method
2022-09-03 05:18:19	actually wait newton's method is for approximating the zeroes, ignore me im being dum

veluca

2022-09-03 07:21:23	yeah no that doesn't work, you could Newton on something like `y^5 - x^12` (which has `x^2.4` as a root) but that's still quite expensive
2022-09-03 07:22:20	(you need `(y - x^12/y^4)*0.2` I believe)

bonnibel

2022-09-05 11:29:56

is there anywhere you can just download a recent butteraugli and/or ssimulacra2 exe?

Jyrki Alakuijala

	Traneptora well if you want to approximate `x^2.4` you could let `f(x) = x^2.4` and let `f'(x) = 2.4 x^1.4`
2022-09-05 04:11:36	With x^(1/3) we are approximating log(x + C), not x^(1/2.4)

Traneptora

	Jyrki Alakuijala With x^(1/3) we are approximating log(x + C), not x^(1/2.4)
2022-09-05 04:12:06	I was referring to approximating the sRGB transfer itself

Jyrki Alakuijala

2022-09-05 04:12:18

ah, thanks

Traneptora

2022-09-05 04:12:55	the idea was to do it faster than via powf
2022-09-05 04:13:11	I'd have to think a bit harder about that though
2022-09-05 04:14:30	this was an interesting read about the mathematics of sRGB
2022-09-05 04:14:30	http://entropymine.com/imageworsener/srgbformula/

diskorduser

2022-09-06 04:59:15

jxl larger than png at effort 8 too. 😦

_wb_

2022-09-06 05:03:16

Can you send the image?

diskorduser

2022-09-06 05:04:00

https://www.deviantart.com/dpcdpc11/art/Isolation-Wallpaper-Pack-5120x2880px-836086303

_wb_

2022-09-06 05:09:06

Thx

JendaLinda

	diskorduser jxl larger than png at effort 8 too. 😦
2022-09-06 12:34:55	Pictures with low number of colors sometimes require some tweaking of cjxl parametters. Might be a good candidate for making use of color palette.

diskorduser

2022-09-06 12:36:36

Yeah I know. But I thought issues like this were fixed in recent builds.

_wb_

2022-09-06 01:13:27	cjxl isolation_v02_5120x2880.png -d 0 isolation_v02_5120x2880.png.jxl -g 0 JPEG XL encoder v0.7.0 [AVX2,SSE4,SSSE3,Scalar] Read 5120x2880 image, 4204461 bytes, 188.6 MP/s Encoding [Modular, lossless, effort: 7], Compressed to 4042055 bytes (2.193 bpp). 5120 x 2880, 0.81 MP/s [0.81, 0.81], 1 reps, 4 threads.
2022-09-06 01:21:38	cjxl isolation_v02_5120x2880.png -d 0 isolation_v02_5120x2880.png.jxl -g 0 -X 0 -Y 0 -e 9 --patches 0 JPEG XL encoder v0.7.0 [AVX2,SSE4,SSSE3,Scalar] Read 5120x2880 image, 4204461 bytes, 199.2 MP/s Encoding [Modular, lossless, effort: 9], Compressed to 3790813 bytes (2.057 bpp). 5120 x 2880, 0.14 MP/s [0.14, 0.14], 1 reps, 4 threads.
2022-09-06 01:22:52	so not impossible to beat png but kind of annoying that it doesn't by default

yurume

2022-09-06 01:26:13

today in j40: still working on TOC permutation and backing buffer management version... 10?

diskorduser

2022-09-06 02:21:03	there is a bug in gtk jxl loader plugin or bug in image viewer. it cannot properly decode images encoded with -d 25.
2022-09-06 02:21:15
2022-09-06 02:21:46	top -d 12, bottom -d 25
2022-09-06 02:26:31	Is it normal to encode images slower at high distance? For me, it takes more time to encode image at d25 compared to -d1.
2022-09-06 02:27:28	`❯ cjxl bb.png d1.jxl -d 1 JPEG XL encoder v0.8.0 1c19bb48 [AVX2,SSE4,SSSE3,Unknown] Read 1920x1200 image, 4040483 bytes, 64.1 MP/s Encoding [VarDCT, d1.000, effort: 7], Compressed to 402830 bytes (1.399 bpp). 1920 x 1200, 5.01 MP/s [5.01, 5.01], 1 reps, 4 threads. ❯ cjxl bb.png d25.jxl -d 25 JPEG XL encoder v0.8.0 1c19bb48 [AVX2,SSE4,SSSE3,Unknown] Read 1920x1200 image, 4040483 bytes, 67.4 MP/s Encoding [VarDCT, d25.000, effort: 7], Compressed to 3255 bytes (0.011 bpp). 1920 x 1200, 0.27 MP/s [0.27, 0.27], 1 reps, 4 threads. `

novomesk

2022-09-06 02:40:49

In https://github.com/libjxl/libjxl/blob/main/lib/include/jxl/encode.h#L1084 there is written that distance range is 0 .. 15 and recommended 0.5 .. 3.0 How is it possible to use 25 distance?

diskorduser

2022-09-06 02:51:55

I was just testing. I actually never use above d1.5 for my photos.

JendaLinda

2022-09-06 02:53:44

I've noticed that in very high distance, vardct just gives up and lossy modular is used instead. It also uses downsampling, although downsampled pictures are often decoded incorrectly.

diskorduser

2022-09-06 03:14:38

oh that's why it takes more time!!!

_wb_

2022-09-06 04:18:29	Old versions of libjxl had a bug in some decoding code paths where upsampling is needed - I think those bugs should be fixed now
2022-09-06 04:20:03	At some point if you use very high distance, libjxl will just downsample the whole image and encode that instead. Obviously that causes things to be blurry.

Cool Doggo

2022-09-07 04:14:52

resampling <:YEP:808828808127971399>

Jyrki Alakuijala

2022-09-07 09:28:50	file a bug on libjxl ?
2022-09-07 09:42:55	I spent earlier some time improving the performance at distance 128 (to be less embarrassing in somewhat unnecessary low quality benchmarks), now, thinking about utility and actual use, I consider it time wasted -- should have only been looking at d4 and better 🙂
2022-09-07 09:43:31	but in any case, it shouldn't fail or crash at high distances -- that is just unnecessary excitement

yurume

2022-09-07 09:53:48

what is the maximum reasonably possible distance in butteraugli? at that point it can just average all pixels and emit a single-color image...

Jyrki Alakuijala

2022-09-07 09:55:42	in the very beginning I calibrated between 0.5 and 2.0, the decision boundaries were 0.6, 0.8, 1.0 and 1.3 or so
2022-09-07 09:55:54	the training material fell into those categories
2022-09-07 09:57:16	later I made it work less wildly outside of this area by other kinds of tweaks, possibly up to d8 or d16 or so
2022-09-07 09:57:47	however, the aggregation of the data is often with a higher p-norm, whereas a poor quality image would benefit from a low p-norm (such as 2-norm)

novomesk

2022-09-07 12:13:22

libjxl depends on highway (for SIMD operations). (Un)fortunately, highway is very sensitive to compiler-related bugs. Those issues are being discovered when someone wants to try libjxl on a rare architecture. Recently I have got a report about https://bugs.gentoo.org/869077 and we are searching for the solution.

Jyrki Alakuijala

2022-09-07 02:01:05	you know more about it than I do
2022-09-07 02:01:44	consider filing an issue and then we can together start to build clarity on what is going on -- we don't need to know all scenarios beforehand to file an issue
2022-09-07 02:03:57	I'm not aware of a change that breaks backward compatibility for cjxl/djxl -- and I don't think we took such risk intentionally after freezing the format in end of 2020
2022-09-07 02:04:51	we did some backward incompatible changes in the format definition, but we were rather sure that those were not paths that were exercised in any cjxl version we had written
2022-09-07 02:06:19	AFAIK we don't have a lot of tests for backward compatibility yet, like no newer encoders being decoded on old decoders, so we can mess up perhaps too easily
2022-09-07 02:37:31	Thank you 🙂

novomesk

2022-09-07 03:04:10

Please attach the new-20.jxl to the issue too.

_wb_

2022-09-07 05:06:00

Nothing was done that breaks backwards compatibility. Old versions of djxl do have some bugs though, e.g. in upsampling

Jyrki Alakuijala

2022-09-12 12:53:42	consider retweeting about xyb jpeg: https://twitter.com/jyzg/status/1569249920556888066
2022-09-12 12:54:02	haha, here it got greenwashed
2022-09-12 12:55:41	xyb needs to find more uses, it seems

Traneptora

	Jyrki Alakuijala xyb needs to find more uses, it seems
2022-09-12 12:59:20	how does xyb-jpeg work?
2022-09-12 12:59:31	I thought JPEG forced a specific YCbCr matrix

Jyrki Alakuijala

2022-09-12 01:03:28	through ICC v4 profile
2022-09-12 01:04:16	JPEG without a color profile is commonly interpreted with a special YCbCr matrix or is stored directly in RGB or CMYK
2022-09-12 01:04:42	about half the savings are from expressing the image in XYB
2022-09-12 01:05:25	the other half is from using JPEG XL's adaptive quantization heuristics through variable dead zone quantization (no real adaptive quantizaiton possible in old JPEG)
2022-09-12 01:05:38	the same trick that we used originally in guetzli

Traneptora

2022-09-12 01:05:42	so as long as the ICC Profile is respected it will be viewed properly
2022-09-12 01:05:46	unfortunately firefox greenwashes it
2022-09-12 01:05:51	and ignores the ICCP

Jyrki Alakuijala

2022-09-12 01:06:15	yes, we have some ideas on how to make it less disruptive when there is no color management
2022-09-12 01:06:46	ICC v4 is available in firefox, but default off

Traneptora

2022-09-12 01:06:54	.... why?
2022-09-12 01:06:59	and how do you turn it on?

The_Decryptor

2022-09-12 01:07:09	Should make it more disruptive, squeaky wheel gets the grease :p
2022-09-12 01:07:49	Set `gfx.color_management.enablev4` to true in about:config

Traneptora

2022-09-12 01:08:14

looks like that worked, thanks

The_Decryptor

2022-09-12 01:08:40

And `gfx.color_management.mode` to 1 for good measure if you haven't done that, by default it only does colour management for images with embedded profiles, page colours are left as-is

Jyrki Alakuijala

2022-09-12 01:08:56

https://bugzilla.mozilla.org/show_bug.cgi?id=488800 has discussion

Traneptora

	Jyrki Alakuijala through ICC v4 profile
2022-09-12 01:09:42	could this v4 profile theoretically be detected automatically by cjxl the way that sRGB and other common profiles are auto-converted to enums? and if so, could you make it so XYB-jpeg is auto-reconstructed as XYB VarDCT losslessly?
2022-09-12 01:10:22	or would there really be minimal reason to do that

Jyrki Alakuijala

2022-09-12 01:10:23	XYB can be expressed in ICC v4 by using a 600 bytes ICC profile description
2022-09-12 01:10:53	sometimes people encode once -- not for a particular client

Traneptora

2022-09-12 01:10:54

are JPEG ICCs compressed?

Jyrki Alakuijala

2022-09-12 01:11:15

they can be, if HDR or wide gamut they need to be

Traneptora

2022-09-12 01:11:34

or mozjpeg?

Jyrki Alakuijala

2022-09-12 01:11:42

the one in libjxl

Traneptora

2022-09-12 01:11:44	comparison to mozjpeg is the fairest comparison IMO
2022-09-12 01:12:08	does libjxl have a JPEG encoder?
2022-09-12 01:12:14	I thought it only has a JPEG decoder

Jyrki Alakuijala

2022-09-12 01:12:18	yes, sort of
2022-09-12 01:12:32	https://github.com/libjxl/libjxl/pull/1735

Traneptora

2022-09-12 01:12:42

Jyrki Alakuijala

2022-09-12 01:15:11

https://storage.googleapis.com/jxl-quality/eval/61297ec/index.jpeg_libjxl_d1.0_nr_q80.html vs https://storage.googleapis.com/jxl-quality/eval/61297ec/index.jpeg_q80.html

2022-09-12 01:56:40	i wonder what's blocking that firefox bug about iccv4/ preventing then from enabling it by default and mode 1
2022-09-12 01:57:11	all the open blocking bugs are seemingly fixed and i would say even works better than chrome right now

yurume

2022-09-12 02:01:20	last (private) sprint for J40 in progress, I've completed j40_advance (finally!) and am now testing it
2022-09-12 02:02:14	this took an eon

2022-09-12 02:04:05

what were the prefixes before it was named j40

yurume

2022-09-12 02:04:25

jxsml

2022-09-12 02:04:42

oh...

Jyrki Alakuijala

2022-09-12 02:42:17	I agree, and I am worried about this equally
2022-09-12 02:42:46	we are looking for mitigations and have an idea that will very likely make the previews to look ok, possibly even improves the accuracy a bit
2022-09-12 02:43:40	basically it would be to use tye YUV instead of RGB as a basic mode, and map Y=Y, U = B, V = X -- instead of R=X, G=Y, B=B today
2022-09-12 02:44:07	it can feel a bit scary, but I think it will largely fix the problem we have today, and then there will only be a minor shift between the colors

BlueSwordM

	Jyrki Alakuijala consider retweeting about xyb jpeg: https://twitter.com/jyzg/status/1569249920556888066
2022-09-12 03:59:41	That's a nice difference. Works quite well when the encoder is actually tuned to utilizing XYB internally.

190n

2022-09-12 04:26:00

https://twitter.com/Foone/status/1569352203575652352 is jxl? <:Thonk:805904896879493180>

_wb_

2022-09-12 05:42:52	isn't palettizing always a lossy operation?
2022-09-12 05:43:04	(except if the input already fits in a small palette, of course)

yurume

2022-09-12 05:45:56	yeah, I don't see how lossless paletted image is possible in general
2022-09-12 05:47:10	https://twitter.com/Foone/status/1569353108723216384 this seems the actual question

veluca

2022-09-12 06:02:37	<@604964375924834314> any idea whether we could make a XYB ICCv2 profile?
2022-09-12 06:05:20	(although admittedly, that bug should be fixed after 13 years...)

spider-mario

2022-09-12 06:06:53	from what I remember, when we looked into it, we decided to make it an ICCv4 profile because we specifically needed a certain feature
2022-09-12 06:07:08	at least to make it compact
2022-09-12 06:07:17	otherwise I guess we could always use a dense LUT
2022-09-12 06:07:36	(and not just the 2×2×2 “LUT” we use to emulate a matrix product)
2022-09-12 06:07:41	(if I remember right?)

veluca

2022-09-12 06:07:57

IIRC v2 doesn't do parametric trC

spider-mario

2022-09-12 06:08:09

yeah, that’s probably what we were missing

veluca

2022-09-12 06:08:16	but maybe the LUT for those isn't too bad
2022-09-12 06:09:16	q1 is whether we can use the same LUT for all 3
2022-09-12 06:09:33	q2 is how big the LUT needs to be to get good quality

spider-mario

	veluca q1 is whether we can use the same LUT for all 3
2022-09-12 06:09:58	if it’s the same TRC then we can use the same entry in the file, if that’s what you mean

veluca

2022-09-12 06:10:05	right
2022-09-12 06:10:08	but we have biases
2022-09-12 06:10:31	i.e. the TRCs are `(x-a)^3 + a^3` or something like it
2022-09-12 06:10:36	and the `a` are not all the same

spider-mario

2022-09-12 06:10:45

ouch

veluca

2022-09-12 06:11:12	now, the first -a we can (probably) get rid of by changing the 3d LUT
2022-09-12 06:11:22	(as I believe that can express any affine transform)
2022-09-12 06:11:43	but the +a^3 I don't think we can?
2022-09-12 06:12:05	maybe by (ab)using the white/black points?

190n

	_wb_ isn't palettizing always a lossy operation?
2022-09-12 06:18:09	i think they mean lossy representation of the palette indices

yurume

2022-09-12 06:18:55

well given a later tweet I think the author actually wants a fixed palette

_wb_

2022-09-12 06:19:18

Yeah, pointed him to pngquant

yurume

2022-09-12 06:19:21	those 256 colors are pretty surely not configurable
2022-09-12 06:19:39	and probably 2-3-2 or similar, forgot which one has 3 bits

_wb_

2022-09-12 06:19:46

Green probably

yurume

2022-09-12 06:20:38	(off-topic: and I searched Wikipedia and accidentally typed 256 bits, wonder how 256-bit colors would look like...)
2022-09-12 06:20:52	anyway, R3-G3-B2 is a norm

_wb_

2022-09-12 06:21:22

The "safe for web" palette is also interesting, iirc it's something like a 6x6x6 color cube, so not integer bits for rgb

yurume

2022-09-12 06:21:49

6x6x6 plus shades of gray I think

190n

2022-09-12 06:22:47

00 33 66 99 cc ff

_wb_

2022-09-12 06:22:49

It left some room for "OS defined colors", iirc

yurume

2022-09-12 06:23:37

256-bit colors make my wildest dream, can we actually encode a spectral approximation lol

_wb_

2022-09-12 06:24:09	8 channels of 32-bit samples
2022-09-12 06:24:15	Not too wild

yurume

2022-09-12 06:24:48

just 8 color model then?

_wb_

2022-09-12 06:24:53	CMYK+OGV+Alpha
2022-09-12 06:25:35	That is plausibly something a fancy application aiming at print might want to use
2022-09-12 06:25:42	Of course 32-bit is a bit insane

yurume

2022-09-12 06:25:46	I've seen non-CMYK four-color model back when I looked at the mixbox paper https://scrtwpns.com/mixbox.pdf
2022-09-12 06:26:14	which is by the way a really visually appealing work, see also https://scrtwpns.com/mixbox/

190n

	_wb_ Yeah, pointed him to pngquant
2022-09-12 06:29:16	author uses they/them

_wb_

2022-09-12 06:39:31

Ah oops

Traneptora

	Jyrki Alakuijala it can feel a bit scary, but I think it will largely fix the problem we have today, and then there will only be a minor shift between the colors
2022-09-12 08:12:34	how does using Y,B,X instead of YCbCr actually improve the compression? is the improved transfer function really that impressive?
2022-09-12 08:13:05	I keep hearing that XYB as a coding tool improves visual quality at the same bpp, but how does it manage to accomplish this?

_wb_

2022-09-12 08:29:41

more perceptually relevant and uniform means loss is also in the perceptual "least significant bits"

JendaLinda

	yurume https://twitter.com/Foone/status/1569353108723216384 this seems the actual question
2022-09-12 08:39:09	Considering the requirements, it seems the pictures are going to be displayed using basic VGA/MCGA hardware.
2022-09-12 08:41:17	The default 256 color VGA palette is neither 3:3:2 or 6x6x6, it may be redefined though.

yurume

2022-09-12 08:43:59

heh, indeed, it seems that the palette itself is actually an 18-bit color.

JendaLinda

2022-09-12 08:47:54	Yes, VGA can do only 6 bits per channel.
2022-09-12 08:50:00	Later video cards could do 8 bpc but the legacy VGA modes were still limited to 6 bpc.
2022-09-12 09:01:31	pngquant looks pretty good. Small paletted PNGs are manageable even for very old PCs.

yurume

2022-09-13 09:40:15	I'm looking for the final API tweak before initial release of J40 (which is hopefully later today)
2022-09-13 09:40:48	basically, ```c j40_frame frame; if (j40_next_frame(&image, &frame) == J40_OK) { // or while(...) for multiple frames // do something with frame } ```
2022-09-13 09:41:09	I don't like this, for the reason that this would be the only place that `J40_OK` has to be used.
2022-09-13 09:45:03	normally, people will then try the following design: ```c j40_frame *frame = j40_next_frame(&image); if (frame) { ... } ```
2022-09-13 09:45:34	this is reasonable, except that a lifetime management is way harder or even impossible.
2022-09-13 09:46:19	I want that, I don't need to free `j40_frame` manually, and yet want to be notified when I use `j40_frame` past its intended lifetime (for example, next `j40_next_frame` call may invalidate it)
2022-09-13 09:47:12	pointer gives a clear interface but unless you allocate a tiny bit of memory every frame I don't see how the second goal can be achieved
2022-09-13 09:48:05	in comparison, a struct may have a sort of counter or similar to detect the change to the internal frame structure, without actually allocating anything
2022-09-13 09:49:46	if you meant jxl-to-jpeg reconstruction by "JPEG transcode" it's not a scope of J40 at all
2022-09-13 09:50:22	patches and YCbCr are of high priority but the release was delayed for a long time so I will omit them in the initial public release (and put a big warning sign)
2022-09-13 09:51:03	after all, I have this code: ```c #ifndef J40_CONFIRM_THAT_THIS_IS_EXPERIMENTAL_AND_POTENTIALLY_UNSAFE #error "Please #define J40_CONFIRM_THAT_THIS_IS_EXPERIMENTAL_AND_POTENTIALLY_UNSAFE to use J40. Proceed at your own risk." #endif ```
2022-09-13 09:54:03	back to where I have left: so my current design is as follows: ```c if (j40_next_frame(&image)) { // note no frame argument j40_frame frame = j40_current_frame(&image); // will put a placeholder j40_frame on failure ... } ```
2022-09-13 09:54:34	this split is clear in design, I'm less sure that this is actually safe
2022-09-13 09:55:04	I'm almost sure that if `j40_current_frame` is called without prior `j40_next_frame` it will have to call `j40_next_frame` automatically
2022-09-13 09:55:54	how about other edge cases though? I'd appreciate any other possible misuses possible in this design.
2022-09-13 10:57:19	I actually haven't tested this, I guess it's not too many because in the current conformance tests, each image tests a distinct feature and not its specifics; J40 still lacks quite many features when you count them naively
2022-09-13 11:02:02	say, "modular" and "VarDCT" are the biggest features and may require thousands of tests for full testing, but if naively counted they are just two out of many features

_wb_

2022-09-13 12:23:22	the conformance repo mostly contains bitstreams that combine lots of things and that are "hard" in general to decode correctly — passing conformance is intended to mean that it's quite likely that everything was implemented and implemented in a correct way.
2022-09-13 12:28:16	probably a better way to indicate what is supported is to say what kind of encoder output you can handle: I would guess that J40 can currently decode fjxl output and anything produced by cjxl -e 1-6 (both lossless and lossy) with png input and without extra flags, and recompressed RGB 4:4:4 jpegs (decoding only to pixels of course, not to reconstructed jpeg)
2022-09-13 12:29:17	with patches and YCbCr that brings it up to any speed setting of cjxl and any recompressed jpeg
2022-09-13 12:32:05	then I guess there's the multi-layer / multi-frame thing that requires blending to be implemented properly (needed for e.g. dealing with recompressed apng animations), and then there's dealing with progressive passes, and then I think it will decode anything libjxl can produce
2022-09-13 12:33:29	splines are not currently produced by libjxl, the only way to produce jxl files with splines is using jxl_from_tree atm. I'd still add it to J40 of course, but that's probably the lowest priority thing to add to it.

yurume

2022-09-13 12:37:23	yeah, after the public release the next priority will be multi-frame support for patches
2022-09-13 12:37:37	(and I'll put a proper issue tracker there)
	_wb_ probably a better way to indicate what is supported is to say what kind of encoder output you can handle: I would guess that J40 can currently decode fjxl output and anything produced by cjxl -e 1-6 (both lossless and lossy) with png input and without extra flags, and recompressed RGB 4:4:4 jpegs (decoding only to pixels of course, not to reconstructed jpeg)
2022-09-13 12:40:35	thank you for this summary, I guess I'll steal it (with an addition of `--patches 0`, I think it is enabled by default?)
2022-09-13 12:41:18	ah and `--dots 0`, this is also enabled by default

_wb_

2022-09-13 12:44:01	iirc patches and dots are disabled by default at e<7, but best to check the code
2022-09-13 12:44:37	maybe it's still doing patches at somewhat lower effort in modular mode

veluca

2022-09-13 12:46:59

it is

yurume

2022-09-13 12:47:55	SpeedTier has the following docs: ``` // Turns on dots, patches, and spline detection by default, as well as full // context clustering. Default. kSquirrel = 3, ```
2022-09-13 12:48:06	so I guess up to `-e 2` are fine but above are possibly not
2022-09-13 12:49:07	more accurately, libjxl seems to enable patches for `-e 3` and above and dots for `-d 3` and above
2022-09-13 12:49:25	so... `-e 1 -d 3` can actually enable dots as I can tell?

veluca

2022-09-13 01:44:46	-e 3 is not kSquirrel though xD
2022-09-13 01:44:52	(confusing? yeeeeeep)

yurume

2022-09-13 01:47:19	waaaait
2022-09-13 01:47:45	huh indeed
2022-09-13 01:48:05	yeah, SpeedTier n is equivalent to `-e (10-n)`
2022-09-13 01:48:20	so it's actually `-e 7` and above, well
2022-09-13 01:51:52	and dots are not enabled in the same condition, `FindDotDictionary` has its own condition and the caller has another

veluca

2022-09-13 01:52:15	yeah but the comment lies -- for lossless that's not the speed they get enabled at
2022-09-13 01:53:13	IIRC it's somewhere in enc_modular.cc

yurume

2022-09-13 01:53:19

I see this code, which is the only occurrence of `FindDotDictionary` ```cpp if (info.empty() && ApplyOverride( state->cparams.dots, state->cparams.speed_tier <= SpeedTier::kSquirrel && state->cparams.butteraugli_distance >= kMinButteraugliForDots)) { info = FindDotDictionary(state->cparams, opsin, state->shared.cmap, pool); } ```

veluca

2022-09-13 01:53:29

oh I meant for patches

yurume

2022-09-13 01:53:32	isn't this the sufficient condition?
2022-09-13 01:53:32	ah
2022-09-13 01:55:03	```cpp if (do_color && metadata.bit_depth.bits_per_sample <= 16 && cparams_.speed_tier < SpeedTier::kCheetah && cparams_.decoding_speed_tier < 2) { FindBestPatchDictionary(*color, enc_state, cms, nullptr, aux_out, cparams_.color_transform == ColorTransform::kXYB); PatchDictionaryEncoder::SubtractFrom( enc_state->shared.image_features.patches, color); } ```
2022-09-13 01:55:05	this must be it then

veluca

2022-09-13 02:07:34	no no that's tied to --faster_decoding
2022-09-13 02:07:43	not to `-e`

yurume

2022-09-13 04:32:23	this moment in j40: hahahahahaha
2022-09-13 04:32:55	enlarged, this is a placeholder image returned by a frame when the error has occurred
2022-09-13 04:37:28	I even considered to dynamically generate a placeholder image depending on the current error message, possibly in the thread-local storage, but I can always do that later
2022-09-13 04:38:20	anyway, yeah, the full end-to-end testing with a new CLI works for my main test images!
2022-09-13 04:42:40	https://gist.github.com/lifthrasiir/137a3bf550f5a8408d4d9450d026101f has been updated and this would be the last gist-hosted version

_wb_

2022-09-13 04:52:45

Error images is a nice touch, can be convenient for applications that don't want to deal with errors

yurume

2022-09-14 05:24:43	I've done initial testing and a list is miserable lol https://github.com/lifthrasiir/j40/issues/1
2022-09-14 05:26:38	amazingly enough this doesn't matter much for actual images encoded by cjxl
2022-09-14 05:27:58	pretty bad, it broke malloc
2022-09-14 05:33:03	yes exactly, if I knew J40 would be this large I'd used Rust instead
2022-09-14 05:33:27	this will be a huge problem if you use it as a library
2022-09-14 05:34:53	if you use it as an executable, it might still be a problem because memory corruption can be potentially converted to remote code execution
2022-09-14 05:36:31	there may be reasons not to use Rust, but if it's just an annoyance as opposed to the actual problem (e.g. logistics) then it doesn't seem reasonable
2022-09-14 05:42:03	found a bug; `malloc` size was off during `j40__read_dq_matrix`

veluca

2022-09-14 05:43:06

ngl writing it in Rust would likely have been much easier 😄

improver

2022-09-14 05:44:35

some things deserve C for ideological reasons

yurume

2022-09-14 05:44:48	Rust may have prevented accidental bugs, but architectural issues would be same regardless of languages 😉
2022-09-14 05:45:18	having written J40 in C already, porting it to Rust should be much, much easier
2022-09-14 05:45:44	not because Rust is a better language but because I've done most of works already

improver

2022-09-14 05:46:31

i personally see minimalist C implementations as breath of fresh air in otherwise bloated and overcomplicated softwarelands

yurume

2022-09-14 05:47:34	recently there were lots of lightweight system languages, here lightweight means "no strong protection against temporal memory safety"
2022-09-14 05:48:06	most of them still provide spatial memory safety (automatic free, to name a few) so they are still measurably safer than C
2022-09-14 05:48:32	I once considered writing J40 in Zig for that reason

improver

2022-09-14 05:49:10

honestly something like zig implementation would be cool too. but rust implementation would probably have more popularity because rust is more buzz word and lesser likelihood for bugs

yurume

2022-09-14 05:49:45

yeah, writing in C was only for maximal portability (which I considered important for the future of JPEG XL)

improver

2022-09-14 05:49:48

not being faster is very expected and unlikely will ever change

yurume

2022-09-14 05:49:53

otherwise it is a subpar choice

improver

2022-09-14 05:50:39	it's a good benchmark of how complicated a standard is
2022-09-14 05:51:12	i mean the specification i guess
2022-09-14 05:55:34	looks like multi-threading enabled things versus single-threaded thing, also expected

yurume

2022-09-14 05:59:26	yes, J40 currently doesn't support multi-threaded decoding (but most infrastructure is there, just not prioritized)
2022-09-14 06:07:03	should I just enable asan for dj40 builds 🙂

improver

2022-09-14 06:09:05

enable all the sans

yurume

2022-09-14 06:12:35	for now, yes, it supports only a tiny subset of what it should support in the future
2022-09-14 06:12:49	you can see that even `J40_RGB` is commented out 😉

Info

JPEG XL

General chat

Voice Channels

Archived

on-topic

Whatever else