|
CrushedAsian255
|
2024-11-15 08:23:51
|
Lossy modular uses the Squeeze transform, but there aren’t really specific blocks
|
|
2024-11-15 08:23:55
|
Unless you mean group size?
|
|
|
|
salrit
|
2024-11-15 08:24:20
|
I wanted to know about the lossless mode.
|
|
|
CrushedAsian255
|
2024-11-15 08:25:18
|
Lossless modular is split up into groups for parallel encoding but there aren’t really “blocks” as you would find in VarDCT and other DCT based formats
|
|
|
|
salrit
|
2024-11-15 08:28:23
|
Oh, thanks , I was trying to approach the -WebP kinda way where they divide the image into blocks. BTW, can you describe a line or two about what these groups are?
|
|
2024-11-15 08:29:47
|
For example this is one of the frame info when I am trying to use lossless for a gray image:
{xsize = 512, ysize = 512, xsize_upsampled = 512, ysize_upsampled = 512,
xsize_upsampled_padded = 512, ysize_upsampled_padded = 512, xsize_padded = 512, ysize_padded = 512,
xsize_blocks = 64, ysize_blocks = 64, xsize_groups = 2, ysize_groups = 2, xsize_dc_groups = 1,
ysize_dc_groups = 1, num_groups = 4, num_dc_groups = 1, group_dim = 256, dc_group_dim = 2048}
|
|
|
CrushedAsian255
|
2024-11-15 08:31:26
|
Not sure what is going on there
|
|
2024-11-15 08:31:38
|
I haven’t used the api
|
|
2024-11-15 08:31:42
|
Just the cli
|
|
|
|
salrit
|
2024-11-15 08:33:53
|
Okay, thnx
|
|
|
_wb_
|
2024-11-15 08:56:05
|
group_dim x group_dim is the size of a section that is coded independently (so can be encoded/decoded in parallel)
|
|
2024-11-15 08:57:03
|
xsize x ysize is the coded size of the frame, which gets split into sections of group_dim size
|
|
2024-11-15 08:57:38
|
blocks and DC groups are not relevant for lossless
|
|
|
|
salrit
|
|
_wb_
xsize x ysize is the coded size of the frame, which gets split into sections of group_dim size
|
|
2024-11-15 11:14:59
|
Coded independently .. as in each section has a separate tree for the prediction and then sent for entropy coding individually?
|
|
|
_wb_
|
2024-11-15 11:17:09
|
The tree can be separate or shared, but the ANS stream is initialized separately per section.
|
|
|
lonjil
|
|
_wb_
Not sure if I should keep that bump for cameras, it's based on the 102-megapixel Fujifilm GFX100 which was released in 2019, while today in 2024 pretty much the highest resolution camera around is the Sony α7R V at 61 megapixels.
My conclusion is they went ahead and broke that 100 megapixel barrier and then decided ~50 megapixels is actually enough.
|
|
2024-11-15 11:18:54
|
here's a fun thing: the Fujifilm GFX100 cameras, and some other cameras like the Sony a7R V, have a "pixel shift" feature that will take multiple exposures with the sensor moved slightly. 4x pixel shift moves the sensor by single pixels, giving you better color information but not higher resolution. 16x color shift moves the sensor by half-pixel distances, producing 4x higher resolution images. For this reason, some people have called the GFX100 a "400 MP camera".
|
|
|
|
salrit
|
|
_wb_
The tree can be separate or shared, but the ANS stream is initialized separately per section.
|
|
2024-11-15 11:19:20
|
thnx
|
|
|
CrushedAsian255
|
|
_wb_
blocks and DC groups are not relevant for lossless
|
|
2024-11-15 11:20:17
|
aren't DC groups still used for lossless?
|
|
2024-11-15 11:20:29
|
or am i thhinking of squeeze
|
|
|
_wb_
|
2024-11-15 11:22:13
|
Yes, when using squeeze, Modular will follow the section structure of VarDCT, putting the very lowest frequency data in the Global section, the rest of the data for an 1:8 image in the DC groups, and the rest in the AC groups (in the corresponding passes, in case of multiple passes).
|
|
2024-11-15 11:22:38
|
But for usual lossless, things are not progressive and just encoded only in AC groups.
|
|
|
CrushedAsian255
|
2024-11-15 11:22:40
|
what is stored in Global section?
|
|
2024-11-15 11:22:45
|
for vardct and for modular?
|
|
|
_wb_
|
2024-11-15 11:26:17
|
The Global section has the shared tree (if there is a shared one) and basically all the data that still fits within a single group-sized chunk. So if the image is, say, 16000x8000, and you use Squeeze, then the Global section will contain everything up to 250x250, which will be enough for a 500x250 preview. The DC groups will contain the data needed to get a 2000x1000 preview (1:8). And the AC groups will contain the rest.
|
|
2024-11-15 11:27:10
|
The Global section also contains the splines and patches, noise parameters, and quantization tables.
|
|
|
CrushedAsian255
|
2024-11-15 11:27:14
|
so if squeeze is not enabled, will the LF be empty?
|
|
2024-11-15 11:27:52
|
im guessing HF Global is not used for modular?
|
|
|
_wb_
|
2024-11-15 11:28:07
|
For VarDCT the Global section is typically quite small since it doesn't contain any actual image data.
|
|
2024-11-15 11:28:51
|
For Modular it will be even smaller if Squeeze is not used, since it will only have the splines/patches/noise data.
|
|
|
CrushedAsian255
|
2024-11-15 11:29:23
|
for a 1440x960 image there is data in both LfGlobal and LfGroup(0), for that size couldn't it all fit in just LfGlobal as 1:8 is 180x120?
|
|
2024-11-15 11:29:37
|
(lossy modular)
|
|
|
_wb_
|
2024-11-15 11:39:49
|
uhm yes, I wouldn't expect that to happen. Can you dump some debug info to output what exactly the modular data is in each section? E.g. compile with `-DJXL_DEBUG_V_LEVEL=10` and then do a single-threaded encode or decode.
|
|
|
CrushedAsian255
|
2024-11-15 11:44:30
|
I am having issue building it on Mac
|
|
2024-11-15 11:45:27
|
Is there a guide or should I use a Linux vm
|
|
|
Tirr
|
2024-11-15 11:48:02
|
what problem have you encountered?
|
|
2024-11-15 11:49:44
|
basically you'll need cmake, ninja, and graphviz (in addition to xcode command line tools, which should include clang). then run `SKIP_TEST=1 ./ci.sh release -DBUILD_TESTING=Off`
|
|
|
CrushedAsian255
|
2024-11-15 11:50:42
|
Hang on I’ll get back to you in a few minutes have to sort something IRL
|
|
|
Tirr
basically you'll need cmake, ninja, and graphviz (in addition to xcode command line tools, which should include clang). then run `SKIP_TEST=1 ./ci.sh release -DBUILD_TESTING=Off`
|
|
2024-11-15 11:50:51
|
Can I install those through brew?
|
|
|
Tirr
|
2024-11-15 11:50:55
|
yep
|
|
|
Tirr
basically you'll need cmake, ninja, and graphviz (in addition to xcode command line tools, which should include clang). then run `SKIP_TEST=1 ./ci.sh release -DBUILD_TESTING=Off`
|
|
2024-11-15 11:53:40
|
maybe replacing `release` with `debug` would be better since you're trying to debug
|
|
|
spider-mario
|
|
lonjil
here's a fun thing: the Fujifilm GFX100 cameras, and some other cameras like the Sony a7R V, have a "pixel shift" feature that will take multiple exposures with the sensor moved slightly. 4x pixel shift moves the sensor by single pixels, giving you better color information but not higher resolution. 16x color shift moves the sensor by half-pixel distances, producing 4x higher resolution images. For this reason, some people have called the GFX100 a "400 MP camera".
|
|
2024-11-15 11:57:19
|
one could argue that 4× pixel shift gives you higher chroma resolution: https://www.strollswithmydog.com/bayer-cfa-effect-on-sharpness/
|
|
|
CrushedAsian255
|
|
Tirr
maybe replacing `release` with `debug` would be better since you're trying to debug
|
|
2024-11-15 11:57:48
|
What debug?
|
|
|
spider-mario
|
2024-11-15 11:57:54
|
> In conclusion we have seen that the effect of a Bayer CFA on the spatial frequencies and hence the ‘sharpness’ information captured by a sensor compared to those from the corresponding monochrome version can go from (almost) nothing to halving the potentially unaliased range, based on the chrominance content of the image and the direction in which the spatial frequencies are being stressed.
|
|
|
lonjil
|
|
spider-mario
one could argue that 4× pixel shift gives you higher chroma resolution: https://www.strollswithmydog.com/bayer-cfa-effect-on-sharpness/
|
|
2024-11-15 11:57:55
|
absolutely
|
|
|
CrushedAsian255
|
2024-11-15 12:19:01
|
i think its building
|
|
2024-11-15 12:19:10
|
nevermind
|
|
2024-11-15 12:19:31
|
`~/jxl_build/libjxl/lib/extras/dec/apng.cc:581:5: error: no matching function for call to 'png_set_keep_unknown_chunks'`
|
|
2024-11-15 12:20:36
|
```
libjxl/lib/extras/dec/jpg.cc:221:5: error: no matching function for call to 'jpeg_mem_src'
221 | jpeg_mem_src(&cinfo, reinterpret_cast<const unsigned char*>(bytes.data()),
| ^~~~~~~~~~~~
/Library/Frameworks/Mono.framework/Headers/jpeglib.h:959:14: note: candidate function not viable: 2nd argument ('const unsigned char *') would lose const qualifier
959 | EXTERN(void) jpeg_mem_src JPP((j_decompress_ptr cinfo,
| ^
960 | unsigned char * inbuffer,
| ~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
|
|
2024-11-15 12:21:51
|
<@206628065147748352> do you know what this means
|
|
2024-11-15 12:25:14
|
I don't know how to get it to build but here is what I did
take this file `a.jxl` and run `cjxl a.jxl b.jxl -m 1 -d 1`
`b.jxl` will show the weird behaviour
here is `a.jxl` and `b.jxl`
|
|
2024-11-15 12:31:11
|
(The file I first noticed the issue was different but i didn’t want to share it so I found this other file which also displays the same symptoms)
|
|
|
|
salrit
|
|
_wb_
The tree can be separate or shared, but the ANS stream is initialized separately per section.
|
|
2024-11-15 02:20:02
|
So, is it possible to give an overview for lossless... like initially the enc_modular is set using the image features (size, # of colour planes etc) and then divide the image into sections to process them in parallel. Each section is now subjected to prediction (where there might be a shared tree or individual) and then the ANS individually for each section. Overall for general lossless using maybe -e 9 this is the flow :
EncodeImageJXL -> ReadCompressedOutput -> JxlEncoderProcessOutput -> ProcessOneEnqueuedInput -> EncodeFrame -> EncodeFrameOneShot -> ComputeEncodingData -> ComputeTree -> EncodeGroups
I am particularly interested in the predictor, found that sometimes some pre-made splits are used, (if I understood correctly) and then fo higher effort cases the tree is built. How is tree_splits_ decided and
This is an o/p of a simple grayscale image's GDB session :
" std::vector of length 493, capacity 493 = {{splitval = 22, property = 1, lchild = 1, rchild = 2,
predictor = jxl::Predictor::Weighted, predictor_offset = 0, multiplier = 1}, {splitval = 23,
property = 1, lchild = 3, rchild = 4, predictor = jxl::Predictor::Weighted, predictor_offset = 0,
multiplier = 1}, {splitval = 2, property = 10, lchild = 261, rchild = 262,
predictor = jxl::Predictor::Weighted, predictor_offset = 0, multiplier = 1}, ......"
What does the fields represent particulalry (splitval and property)?
|
|
2024-11-15 02:21:10
|
underscore got the italics font... pls ignore them
|
|
|
_wb_
|
2024-11-15 02:21:52
|
property is which of these to test against
|
|
2024-11-15 02:22:21
|
the decision nodes in the MA tree are of the form [property] > [splitval]
|
|
2024-11-15 02:23:36
|
so splitval=22, property=1 means it's a decision node "stream index > 22" (with one branch for the > case and another for the <= case)
|
|
2024-11-15 02:25:04
|
at low effort settings a prebaked tree is used, at higher effort a tree is constructed based on the image data, with higher efforts using more of the available properties than lower efforts
|
|
|
|
salrit
|
2024-11-15 02:30:28
|
So the properties are basically the contexts.. based on which the pixels are segregated
|
|
|
_wb_
at low effort settings a prebaked tree is used, at higher effort a tree is constructed based on the image data, with higher efforts using more of the available properties than lower efforts
|
|
2024-11-15 02:41:43
|
The predictor might be different for each node of the MA Tree? and is it that for higher efforts all the predictors are tried to decide the best one?
|
|
|
_wb_
|
2024-11-15 03:18:58
|
Higher efforts will try more predictors and will also use more of the properties to construct an MA tree.
|
|
2024-11-15 03:19:39
|
And yes, the predictor can be different in each node, so the tree is both for context modeling and for selecting which predictor to use
|
|
|
|
salrit
|
2024-11-15 04:16:24
|
Thanx
|
|
2024-11-16 06:38:12
|
Got something else too :
What is the ModularStreamID, suppose for grey-scale it is ModularAC, what does that mean and secondly what is the "num_streams" parameter, it is derived from the ModularStreamID and is used in the "tree_splits" also... ?
|
|
|
_wb_
|
2024-11-16 07:18:51
|
There is some numbering scheme for the sections (streams) which is also used to allow MA trees to be shared between sections while still having differences between sections since you can have decision nodes based on stream ID
|
|
2024-11-16 07:21:35
|
By the way, decision nodes based on stream ID, channel number, and row number are not actually causing branching in the decoder, since it will specialize the tree.
|
|
|
CrushedAsian255
|
|
_wb_
By the way, decision nodes based on stream ID, channel number, and row number are not actually causing branching in the decoder, since it will specialize the tree.
|
|
2024-11-16 07:45:05
|
So the decoder prunes the tree before running it?
|
|
|
_wb_
There is some numbering scheme for the sections (streams) which is also used to allow MA trees to be shared between sections while still having differences between sections since you can have decision nodes based on stream ID
|
|
2024-11-16 07:46:59
|
Are stream IDs the “group number”s in jxl art?
|
|
|
_wb_
|
2024-11-16 08:06:23
|
yes
|
|
2024-11-16 08:06:32
|
and yes
|
|
2024-11-16 08:06:54
|
streams, sections, groups are kind of used interchangeably
|
|
|
Tirr
|
2024-11-16 08:08:58
|
the spec uses the term "stream ID" in the context of Modular sub-streams
|
|
2024-11-16 08:12:48
|
every Modular substreams gets its unique ID, even for VarDCT varblock quantization matrices IIRC
|
|
|
_wb_
|
2024-11-16 08:16:22
|
> The stream index is defined as follows: for GlobalModular: 0; for LF coefficients: 1 + LF group index; for ModularLfGroup: 1 + num_lf_groups + LF group index; for HFMetadata: 1 + 2 * num_lf_groups + LF group index; for RAW dequantization tables: 1 + 3 * num_lf_groups + parameters index (see I.2.4); for ModularGroup: 1 + 3 * num_lf_groups + 17 + num_groups * pass index + group index.
|
|
|
CrushedAsian255
|
2024-11-16 08:17:34
|
Is modular initialised once and is used everywhere?
|
|
2024-11-16 08:17:47
|
I guess that’s what Global is for?
|
|
|
_wb_
|
2024-11-16 08:18:31
|
every modular substream can have its own transforms and MA tree and everything
|
|
|
CrushedAsian255
|
2024-11-16 08:19:05
|
Then why do they all share a single namespace for streams?
|
|
|
_wb_
|
2024-11-16 08:19:36
|
but there's also per frame a concept of one large modular image, which can have global transforms and a global MA tree that can be used in each of the substreams
|
|
|
CrushedAsian255
|
2024-11-16 08:19:52
|
Can each sub bitstream signal to just use the global modular subbitstream ?
|
|
|
_wb_
|
2024-11-16 08:20:55
|
a specific modular substream can choose to either use the global tree or to define its own local tree
|
|
|
CrushedAsian255
|
2024-11-16 08:21:23
|
That makes sense
|
|
2024-11-16 08:21:55
|
Does the MA tree leaves define the ANS context?
|
|
2024-11-16 08:23:40
|
How do ANS contexts work again?
|
|
|
|
salrit
|
2024-11-16 08:25:16
|
There was this thing too, well two different ways the heuristic properties are defined, one this way :
splitting_heuristics_properties = std::vector of length 16, capacity 16 = {0, 1, 15, 9, 10, 11, 12, 13, 14, 2, 3, 4, 5, 6, 7, 8} and in some other way if the squeeze Trsfm is used, Couldn't understand the way of defining ?
|
|
|
_wb_
|
|
CrushedAsian255
How do ANS contexts work again?
|
|
2024-11-16 08:28:20
|
Yes. The context determines the probabilities for each token, which determine how many fractional bits it will take to encode that token (high-probability ones will use fewer bits, low-probability ones more)
|
|
|
CrushedAsian255
|
2024-11-16 08:28:49
|
Fractional bits?
|
|
2024-11-16 08:29:07
|
Like in Arithmetic coding?
|
|
|
_wb_
|
|
salrit
There was this thing too, well two different ways the heuristic properties are defined, one this way :
splitting_heuristics_properties = std::vector of length 16, capacity 16 = {0, 1, 15, 9, 10, 11, 12, 13, 14, 2, 3, 4, 5, 6, 7, 8} and in some other way if the squeeze Trsfm is used, Couldn't understand the way of defining ?
|
|
2024-11-16 08:29:39
|
depending on the effort setting, it will use some prefix of that vector. These are indices of properties, sorted more or less from "most useful" to "least useful"
|
|
|
CrushedAsian255
Like in Arithmetic coding?
|
|
2024-11-16 08:30:35
|
Huffman coding requires an integer number of bits per encoded token, while in ANS (and arithmetic coding / range coding) you can have symbols that take less than one bit
|
|
|
CrushedAsian255
|
|
_wb_
depending on the effort setting, it will use some prefix of that vector. These are indices of properties, sorted more or less from "most useful" to "least useful"
|
|
2024-11-16 08:30:45
|
Why are not useful properties included?
|
|
|
|
salrit
|
|
_wb_
depending on the effort setting, it will use some prefix of that vector. These are indices of properties, sorted more or less from "most useful" to "least useful"
|
|
2024-11-16 08:30:48
|
Ahh.. like the DEFLATE type two use of prefix tokens.. the more common ones are sorted first
|
|
|
_wb_
|
|
CrushedAsian255
Why are not useful properties included?
|
|
2024-11-16 08:31:24
|
They're all useful, some are just typically more useful than others. It also depends on the image content which properties are useful.
|
|
|
CrushedAsian255
|
2024-11-16 08:31:47
|
So it’s ordered by which ones are more commonly useful?
|
|
|
_wb_
|
2024-11-16 08:32:55
|
yes — though maybe there's room for improvement there, it's all just encoder heuristics that were at some point determined based on some corpus of images and the state of the encoder as it was then
|
|
|
CrushedAsian255
|
|
_wb_
Huffman coding requires an integer number of bits per encoded token, while in ANS (and arithmetic coding / range coding) you can have symbols that take less than one bit
|
|
2024-11-16 08:33:07
|
How does ANS actually work compared to range coding? Is rANS like a hybrid?
|
|
|
|
salrit
|
|
_wb_
There is some numbering scheme for the sections (streams) which is also used to allow MA trees to be shared between sections while still having differences between sections since you can have decision nodes based on stream ID
|
|
2024-11-16 08:38:35
|
I not sure I get it fully , suppose for a grey image of 512 x 512, I use lossless encoding and this is the frame_data :
{xsize = 512, ysize = 512, xsize_upsampled = 512, ysize_upsampled = 512,
xsize_upsampled_padded = 512, ysize_upsampled_padded = 512, xsize_padded = 512, ysize_padded = 512,
xsize_blocks = 64, ysize_blocks = 64, xsize_groups = 2, ysize_groups = 2, xsize_dc_groups = 1,
ysize_dc_groups = 1, num_groups = 4, num_dc_groups = 1, group_dim = 256, dc_group_dim = 2048}
For this I get the num_streams as 25 , as you pointed out the group_dim which is 256 and the number of groups matters here for lossless which is 4 are relevant for now but if the streams is same as sections and which is same as groups, the 'num_streams = 25' ?
And there was this thing too that the useful_splits in the tree gets initialized to {0, num_streams}, what is the 'useful_splits' here?
|
|
|
_wb_
|
|
CrushedAsian255
How does ANS actually work compared to range coding? Is rANS like a hybrid?
|
|
2024-11-16 08:40:39
|
the way I see it, the main difference between ANS and range coding is that range coding is more or less symmetric between encode and decode, while ANS is a bit faster to decode at the cost of complicating the encoder. Other than that they're pretty similar, and things depend more on _how_ you use them (e.g. one bit at a time vs larger alphabet size, static probabilities vs dynamically updating probabilities, etc) than on ANS vs AC/range coding.
|
|
|
salrit
I not sure I get it fully , suppose for a grey image of 512 x 512, I use lossless encoding and this is the frame_data :
{xsize = 512, ysize = 512, xsize_upsampled = 512, ysize_upsampled = 512,
xsize_upsampled_padded = 512, ysize_upsampled_padded = 512, xsize_padded = 512, ysize_padded = 512,
xsize_blocks = 64, ysize_blocks = 64, xsize_groups = 2, ysize_groups = 2, xsize_dc_groups = 1,
ysize_dc_groups = 1, num_groups = 4, num_dc_groups = 1, group_dim = 256, dc_group_dim = 2048}
For this I get the num_streams as 25 , as you pointed out the group_dim which is 256 and the number of groups matters here for lossless which is 4 are relevant for now but if the streams is same as sections and which is same as groups, the 'num_streams = 25' ?
And there was this thing too that the useful_splits in the tree gets initialized to {0, num_streams}, what is the 'useful_splits' here?
|
|
2024-11-16 08:42:38
|
it's a high number because there are many stream IDs unused, like possibly there are 17 raw quantization tables that each get their own stream id
|
|
2024-11-16 08:43:17
|
for a lossless image, you'll end up with stream ids 21, 22, 23, 24 being used for the actual data
|
|
|
CrushedAsian255
|
2024-11-16 08:44:28
|
Are the others then just set to Set 0 and a singleton of 0 or something ?
|
|
|
_wb_
|
2024-11-16 08:44:43
|
streams that are not needed are not signaled at all
|
|
|
|
salrit
|
2024-11-16 08:46:07
|
okay..
|
|
|
CrushedAsian255
|
2024-11-16 08:47:37
|
JXL uses static probability ranges rANS correct?
|
|
|
|
salrit
|
|
_wb_
it's a high number because there are many stream IDs unused, like possibly there are 17 raw quantization tables that each get their own stream id
|
|
2024-11-16 08:54:41
|
For the same image the tree_splits were such
(gdb) p tree_splits_
$338 = std::vector of length 2, capacity 2 = {0, 25}
What does it represent ? I mean the capacity 2 = {0, 25}?
|
|
|
_wb_
|
2024-11-16 09:28:47
|
I think <@179701849576833024> wrote that code but I think that's just some encoder bookkeeping thing to make sure we use different subtrees for streams corresponding to DC data than for streams corresponding to block selection metadata or modular HF groups, as a bit of an optimization to not have to learn that those are different kinds of data
|
|
|
|
veluca
|
2024-11-16 09:30:22
|
I have some vague memory of this being quantization related, and {0, num_streams} being effectively a noop
|
|
2024-11-16 09:31:58
|
ah, no, it is indeed meant to ensure the encoder doesn't put entirely unrelated data in the same tree
|
|
|
|
salrit
|
|
veluca
ah, no, it is indeed meant to ensure the encoder doesn't put entirely unrelated data in the same tree
|
|
2024-11-16 09:37:14
|
so the {0, num_streams} indicates that there might be a single tree or can be upto - 'num_streams' trees ?
|
|
|
|
veluca
|
2024-11-16 09:37:51
|
no it's just saying that there's one tree covering streams [0, num_streams)
|
|
2024-11-16 09:38:24
|
if it were {0, 5, num_streams} it'd indicate two trees, for `[0, 5)` and `[5, num_streams)`
|
|
|
|
salrit
|
2024-11-16 09:39:22
|
Ahh.. thanks
|
|
2024-11-16 01:24:04
|
What does Computing a Tree and Tokenizing a Tree differ by? There were two steps, CompuTree and then ToeknizeTree...
|
|
|
spider-mario
|
2024-11-16 02:18:27
|
interesting, the Preview app on macOS 15.1 (not sure since when exactly) seems to even display PNGs with an HLG ICC as HDR
|
|
|
CrushedAsian255
|
|
spider-mario
interesting, the Preview app on macOS 15.1 (not sure since when exactly) seems to even display PNGs with an HLG ICC as HDR
|
|
2024-11-16 02:20:59
|
I think it was added in 15.0, I noticed this a couple of weeks ago
|
|
|
spider-mario
|
2024-11-16 02:24:46
|
not JPEG though
|
|
|
|
salrit
|
|
veluca
if it were {0, 5, num_streams} it'd indicate two trees, for `[0, 5)` and `[5, num_streams)`
|
|
2024-11-16 04:32:46
|
const size_t num_toc_entries =
is_small_image ? 1
: AcGroupIndex(0, 0, num_groups, frame_dim.num_dc_groups) +
num_groups * num_passes;
Can you let me know what is this "num_toc_entries" ?
|
|
|
|
veluca
|
2024-11-16 04:36:41
|
It's the number of sections that the frame is split into
|
|
|
|
salrit
|
2024-11-16 04:37:59
|
But that's the num_groups right?
|
|
|
veluca
It's the number of sections that the frame is split into
|
|
2024-11-16 04:39:00
|
{xsize = 512, ysize = 512, xsize_upsampled = 512,
ysize_upsampled = 512, xsize_upsampled_padded = 512, ysize_upsampled_padded = 512,
xsize_padded = 512, ysize_padded = 512, xsize_blocks = 64, ysize_blocks = 64, xsize_groups = 2,
ysize_groups = 2, xsize_dc_groups = 1, ysize_dc_groups = 1, num_groups = 4, num_dc_groups = 1,
group_dim = 256, dc_group_dim = 2048}
Like here the num_groups says that there are 4 , 256 x 256 sections the frame is divided into, if I am not mistaken...
|
|
|
|
veluca
|
2024-11-16 04:41:56
|
yes
|
|
2024-11-16 04:42:01
|
but sections are more than that
|
|
2024-11-16 04:42:12
|
there's also AcGlobal, DcGlobal, and DcGroups
|
|
2024-11-16 04:42:33
|
so the toc should have 7 entries
|
|
|
|
salrit
|
|
veluca
so the toc should have 7 entries
|
|
2024-11-16 04:45:06
|
Yes, it has 7.. Well for lossless encoding what does the AcGlobal, DcGlobal and DcGroups mean?
|
|
|
|
veluca
|
2024-11-16 04:46:04
|
IIRC DcGlobal will contain the tree + histograms, patches, and maybe palette and such, and DcGroups and AcGlobal will likely be empty
|
|
|
|
salrit
|
2024-11-16 04:48:11
|
Thanx, so these are also sections apart from the num_groups.
JXL_RETURN_IF_ERROR(RunOnPool(pool, 0, num_groups, resize_aux_outs,
process_group, "EncodeGroupCoefficients"));
And i think this says that the groups are encoded parallely right by the process_group ?
|
|
|
|
veluca
|
|
|
salrit
|
2024-11-16 04:49:02
|
Cool.. thnx
|
|
|
veluca
yep
|
|
2024-11-16 04:54:17
|
Just one more thing to ask in this regard, like you said the tree for the example I gave is a single one ... (the example of splits = {0, 25})
Does that mean that there is a separate tree computed for each group and then merged later as a single tree? If done so then how each group is entropy coded in parallel ? OR am I missing something in between...?
|
|
|
|
veluca
|
2024-11-16 05:12:19
|
computing the tree is generally faster than compressing the groups
|
|
2024-11-16 05:13:21
|
what jxl does is to first compute some statistics about the image (at high enough efforts), in parallel across groups, then compute a tree, then encode
|
|
|
Laserhosen
|
2024-11-18 02:51:32
|
Apparently skcms, and therefore libjxl in its default configuration, can't deal with this ridiculous 100KB BenQ monitor ICC profile I found in a random PNG 😅 .
```
./lib/jxl/cms/jxl_cms.cc:976: JXL_RETURN_IF_ERROR code=1: skcms_Parse(icc_data, icc_size, &profile)
./lib/jxl/cms/color_encoding_cms.h:528: JXL_RETURN_IF_ERROR code=1: cms.set_fields_from_icc(cms.set_fields_data, new_icc.data(), new_icc.size(), &external, &new_cmyk)
./lib/jxl/encode.cc:1140: ICC profile could not be set
```
I wonder if lcms2 fares any better...
|
|
|
_wb_
|
2024-11-18 03:29:44
|
a 100 KB icc profile for RGB? haven't seen such a monster before, I thought only CMYK profiles got that crazy
|
|
|
spider-mario
|
2024-11-18 03:30:32
|
for monitor profiles, LUTs are common
|
|
2024-11-18 03:32:04
|
for what it's worth, I'm not completely sure we should default to skcms
|
|
|
w
|
2024-11-18 03:33:40
|
displaycal creates ICC with lut
|
|
|
spider-mario
|
2024-11-18 03:34:27
|
often several LUTs, in fact
|
|
2024-11-18 03:34:53
|
three 1D LUTs for calibration in the GPU (vcgt, “video card gamma table”), and 3D LUT for the actual profile part
|
|
|
w
|
2024-11-18 03:36:22
|
lcms2 after all is still the best 🤷
|
|
|
spider-mario
|
2024-11-18 03:36:42
|
yeah, skcms is easier to build (just one file) and runs faster, but lcms2 is more feature-complete and possibly more accurate
|
|
2024-11-18 03:36:55
|
where skcms and lcms2 disagree, I would tend to trust lcms2 more
|
|
|
w
|
2024-11-18 03:37:13
|
and qcms is nowhere close
|
|
|
Laserhosen
|
2024-11-18 03:44:22
|
Yep, RGB, and it seems to work... in the sense that Gwenview displays the PNG differently with/without it.
|
|
|
spider-mario
|
2024-11-18 03:46:45
|
I would say that there is a slim but non-zero chance that this is a bug not in skcms but in our usage of it – could you perhaps send a copy of the ICC profile in question?
|
|
2024-11-18 03:46:53
|
(image optional; I can always attach it to another image myself)
|
|
|
Laserhosen
|
|
spider-mario
|
2024-11-18 03:53:47
|
thanks!
|
|
2024-11-18 03:54:35
|
yep, LUT profile
```
tag 20:
sig 'A2B0' [0x41324230]
size 50740
type 'mft2' [0x6d667432]
[…]
tag 22:
sig 'B2A0' [0x42324130]
size 50740
type 'mft2' [0x6d667432]
```
|
|
|
|
salrit
|
2024-11-18 04:48:37
|
While computing the tree, there are two steps : CollectPixelSamples() which in some sort samples pixels from a distribution, saw Geometric in the case of example (lossless) I had and then the step : PreQuantizeProperties(), What does these two perform ?
|
|
|
_wb_
|
|
spider-mario
for monitor profiles, LUTs are common
|
|
2024-11-18 05:56:18
|
Let me rephrase: I thought ICC profiles you would want to put in a PNG image file wouldn't get that large.
|
|
|
spider-mario
|
2024-11-18 05:57:14
|
depending on the circumstances, you might conceivably attach a monitor profile to a screenshot, at least temporarily
|
|
2024-11-18 05:57:32
|
I would usually then convert the image to a more common colorspace, though
|
|
|
_wb_
|
2024-11-18 05:57:40
|
Big profiles are useful to describe wonky stuff like the details of a display or a printer, but to describe the colorspace used to represent an image?
|
|
2024-11-18 06:01:26
|
I guess for a screenshot in display space it might make sense, yes. Though even then it feels like unnecessarily exposing details of the screen you happen to use, and converting it to something standard and simple would make more sense to me, even if only to avoid creating a vector for fingerprinting.
|
|
2024-11-18 06:01:59
|
After all such very display specific profiles, especially if the result of calibration, would be pretty unique, no?
|
|
|
spider-mario
|
|
fotis3i
|
2024-11-18 07:13:29
|
I have a question regarding Apple's jxl support. I doubt this is a problem of cjxl but thought I'd ask anyway in case somebody here has some insights to share. I am using cjxl to convert HDR png files into jxl files. While both the jxl and png files render properly on iOS18 (the light-sources appear super-bright) , when I click the edit button inside apple photos **some of the photos** will immediately get converted to SDR for editing. Others won't, and they will be HDR-editable. I feel that apple is measuring the number of superbright pixels and decides whether the image should be edited as HDR or as a tone-mapped SDR in a heuristic manner and not by relying on the metadata of the file. Anybody who knows anything about this? I'm looking for a way to enforce HDR editing on apple devices.
|
|
|
CrushedAsian255
|
|
I have a question regarding Apple's jxl support. I doubt this is a problem of cjxl but thought I'd ask anyway in case somebody here has some insights to share. I am using cjxl to convert HDR png files into jxl files. While both the jxl and png files render properly on iOS18 (the light-sources appear super-bright) , when I click the edit button inside apple photos **some of the photos** will immediately get converted to SDR for editing. Others won't, and they will be HDR-editable. I feel that apple is measuring the number of superbright pixels and decides whether the image should be edited as HDR or as a tone-mapped SDR in a heuristic manner and not by relying on the metadata of the file. Anybody who knows anything about this? I'm looking for a way to enforce HDR editing on apple devices.
|
|
2024-11-18 10:47:09
|
Can you run jxlinfo -v on the files?
|
|
|
Foxtrot
|
2024-11-18 11:16:55
|
This is what you get for not supporting JPEG XL 😄 https://www.reuters.com/technology/doj-ask-judge-force-google-sell-off-chrome-bloomberg-reports-2024-11-18/
|
|
|
CrushedAsian255
|
|
Foxtrot
This is what you get for not supporting JPEG XL 😄 https://www.reuters.com/technology/doj-ask-judge-force-google-sell-off-chrome-bloomberg-reports-2024-11-18/
|
|
2024-11-18 11:27:32
|
Sell it to Jon Sneyers and the Cloudinary team
|
|
|
HCrikki
|
2024-11-18 11:38:23
|
decoupling from google would be the most ideal outcome but only a starting point
|
|
2024-11-18 11:39:43
|
android included an 'aosp' browser long before google witheld updates to whats essentially chromium in order to make it costly for oems to complete aosp on their own using their own or google's suite of apps
|
|
2024-11-18 11:40:35
|
architecturally, decoupling the engine from the browser code would do wonders (for mozilla too).
|
|
2024-11-18 11:40:57
|
engines dont need constant updates to the point not even entities like ms can keep up and are forced to rebase on google code and ditch their own
|
|
2024-11-18 11:44:39
|
monoculture needs to end but part of how its enforced is using google's seo/webmaster tools (do this or you wont be visible on google search unless you pay!). web.dev and lighthouse should be removed from google's control too
|
|
|
|
veluca
|
2024-11-19 07:58:05
|
I _probably_ should not be commenting on this, but looking at Mozilla's financials... where would a non-Google-Chrome (or Android) be getting money from? even Mozilla effectively gets most of its money from Google
|
|
2024-11-19 07:58:31
|
(to be clear, I am not necessarily saying that's a good thing. it's just how it is in practice though...)
|
|
|
|
afed
|
2024-11-19 08:57:18
|
google will still give money, but not own <:KekDog:805390049033191445>
|
|
|
_wb_
|
2024-11-19 12:22:09
|
I can see how Android could be supported by not just Google but also phone/tablet hardware vendors. In fact I think they should have more incentive to maintain and improve Android than Google does.
|
|
2024-11-19 12:30:24
|
Also I think Mozilla demonstrates that Google has an incentive to subsidize browser development (since basically the better the web platform and the more the web gets used in general, the more profit they make), regardless of whether that development happens in-house or is done by others.
|
|
|
jonnyawsom3
|
|
_wb_
I can see how Android could be supported by not just Google but also phone/tablet hardware vendors. In fact I think they should have more incentive to maintain and improve Android than Google does.
|
|
2024-11-19 12:53:36
|
I'm fairly sure they already do, given LTT's video on the stock Android installation and how bad it is
|
|
2024-11-19 12:53:51
|
Not even coming with a phone app pre-installed
|
|
|
TheBigBadBoy - 𝙸𝚛
|
2024-11-19 12:58:47
|
LineAgeOS my beloved
|
|
|
|
veluca
|
|
_wb_
Also I think Mozilla demonstrates that Google has an incentive to subsidize browser development (since basically the better the web platform and the more the web gets used in general, the more profit they make), regardless of whether that development happens in-house or is done by others.
|
|
2024-11-19 01:08:32
|
sure... but does it really change so much if it's Google or people that get ~all their money from Google to use Google as the default search engine?
|
|
|
fotis3i
|
|
CrushedAsian255
Can you run jxlinfo -v on the files?
|
|
2024-11-19 01:51:50
|
jxlinfo -v P8159736.jxl returns
box: type: "JXL " size: 12, contents size: 4
JPEG XL file format container (ISO/IEC 18181-2)
box: type: "ftyp" size: 20, contents size: 12
box: type: "jxll" size: 9, contents size: 1
box: type: "jxlp" size: 33, contents size: 25
JPEG XL image, 4640x3472, lossy, 16-bit RGB+Alpha
num_color_channels: 3
num_extra_channels: 1
extra channel 0:
type: Alpha
bits_per_sample: 16
alpha_premultiplied: 0 (Non-premultiplied)
intensity_target: 10000.000000 nits
min_nits: 0.000000
relative_to_max_display: 0
linear_below: 0.000000
have_preview: 0
have_animation: 0
Intrinsic dimensions: 4640x3472
Orientation: 1 (Normal)
Color space: RGB, D65, Rec.2100 primaries, PQ transfer function, rendering intent: Relative
box: type: "brob" size: 1125, contents size: 1117
Brotli-compressed xml metadata: 1125 compressed bytes
box: type: "jxlp" size: 11055036, contents size: 11055028
|
|
2024-11-19 02:08:25
|
<@386612331288723469> you are onto something: I notice now that the files that are demoted to SDR-on-edit have an alpha channel saved in the file - the ones that are edit fine don't! That's a good starting point - now I have to see where in the process does this alpha channel get added! Thank you for helping out
|
|
|
Quackdoc
|
|
_wb_
I can see how Android could be supported by not just Google but also phone/tablet hardware vendors. In fact I think they should have more incentive to maintain and improve Android than Google does.
|
|
2024-11-19 03:12:19
|
I think most likely solution would be to spin aosp out into a seperate non profit
|
|
|
I'm fairly sure they already do, given LTT's video on the stock Android installation and how bad it is
|
|
2024-11-19 03:16:21
|
retarded video,
A) thats not even "stock android" as the development GSIs have a different selection of apps
B) didn't bother installing 3rd party apps which even normies do
|
|
|
_wb_
|
|
veluca
sure... but does it really change so much if it's Google or people that get ~all their money from Google to use Google as the default search engine?
|
|
2024-11-19 03:28:39
|
No, if there's still a strong financial dependency then Google does effectively retain control and the unhealthy monopolistic situation more or less remains. But at least it would change things a little (from direct control to indirect control), and one could hope that other stakeholders would show up to diversify the funding of such a non-Google-Chrome.
Another option imo could be that it becomes a non-profit organization funded by a fund that gets created by a one-time big forced donation from Google ordered by some judge. That way the non-profit can still have a steady income (say it's a $10b fund then it would generate ~$400m/year at current intrest rates), which would not be enough to re-hire the entire current Chrome team but it would certainly be enough to establish a decent independent foundation for the project...
|
|
|
Oleksii Matiash
|
|
Quackdoc
I think most likely solution would be to spin aosp out into a seperate non profit
|
|
2024-11-19 04:12:30
|
It will immediately lag behind Apple
|
|
|
Quackdoc
|
|
Oleksii Matiash
It will immediately lag behind Apple
|
|
2024-11-19 04:15:16
|
not really, AOSP is already an entirely open source software stack. and with chromeOS eventually migrating to AOSP base instead of a gentoo base google has large incentive to keep funding it
|
|
|
Oleksii Matiash
|
|
Quackdoc
not really, AOSP is already an entirely open source software stack. and with chromeOS eventually migrating to AOSP base instead of a gentoo base google has large incentive to keep funding it
|
|
2024-11-19 04:20:34
|
It is opensource but funding, management - it should be held by Google or any other large company, interested in the project, otherwise it will become non competitive very soon
|
|
|
Quackdoc
|
2024-11-19 04:21:04
|
I dont see in what ways this would happen
|
|
|
HCrikki
|
|
veluca
I _probably_ should not be commenting on this, but looking at Mozilla's financials... where would a non-Google-Chrome (or Android) be getting money from? even Mozilla effectively gets most of its money from Google
|
|
2024-11-19 05:07:04
|
revenue share and default search are 2 different things mozilla sold and have no reason to be lumped together in a spreadcheat other than to confuse folks. MS in the past was willing to pay the same amount as google to be default (around 200mo - this is supposed to always be paid upfront regardless of wether users keep your SE default) and almost match the remaining payout for performance (to be paid after cycle - even if people switch from bing mozilla would still get paid by google for google searches)
|
|
2024-11-19 05:08:13
|
the preservation of that exact amount shouldnt be the priority either way since its a faustian pact weakening mozilla further
|
|
2024-11-19 05:10:09
|
the solution is for mozilla to stop putting so many people on its own payroll. take linux kernel or libreoffice, every contributor is paid by his own company/job and so few are on payroll donations and sponsorship deals suffice to cover all expenses
|
|
2024-11-19 05:11:38
|
for firefox, aosp and chromium, gatekeeping only imposes costs the gatekeeper happily accepted to impose themselves in order to gain and keep control - practical development considerations didnt
|
|
2024-11-19 05:16:31
|
on efficiency, did you know every single minor firefox version has over 1400 separate binaries built (including several for each language), even nightlies and mirroring seems to require archiving even the oldest alphas (iinm over 150.000 binaries with 0 use)
|
|
2024-11-19 05:18:16
|
foss projects should stop crippling themselves with non-essential expenses and inefficient workflows with high upkeep costs
|
|
|
Quackdoc
|
2024-11-19 11:51:06
|
<@288069412857315328> grass.moe is your site right? the JXL images seem to be down, do you know of other sites with a simple "jxl" gallery?
testing servo right now.
|
|
2024-11-19 11:51:25
|
Looking for a site to actually stress it
|
|
2024-11-20 12:45:13
|
nvm I had a local copy, but lmao servo + jxl-oxide performs better then waterfox + libjxl because servo isn't crushing my PC
|
|
2024-11-20 01:25:44
|
https://files.catbox.moe/4a8yln.mp4
https://files.catbox.moe/151xdc.mp4
|
|
2024-11-20 01:26:08
|
waterfox crashed
|
|
|
jonnyawsom3
|
2024-11-20 02:34:11
|
Waterfox uses a huge amount of memory during decode from what I can tell, and scaling the image multiples it
|
|
|
Quackdoc
|
2024-11-20 02:50:34
|
servo chad
|
|
|
_wb_
|
2024-11-20 08:52:41
|
https://discord.com/channels/794206087879852103/809126648816336917/1308893568949161985
|
|
2024-11-20 08:52:47
|
For lossless or lossy?
|
|
2024-11-20 09:00:21
|
For lossless the worst is noise. For lossy typically the worst is thin high-contrast edges.
|
|
|
spider-mario
|
2024-11-21 11:32:54
|
https://chaos.social/@mattgrayyes/113520373478295532
|
|
|
HCrikki
|
2024-11-23 10:32:32
|
idk if mentioned lately but it seems new libjxl nightlies are publicly accessible again here after 2 months of no builds
https://artifacts.lucaversari.it/libjxl/libjxl/latest/
|
|
|
A homosapien
|
2024-11-23 10:39:16
|
<:JXL:805850130203934781> <:Stonks:806137886726553651>
|
|
|
|
veluca
|
|
HCrikki
idk if mentioned lately but it seems new libjxl nightlies are publicly accessible again here after 2 months of no builds
https://artifacts.lucaversari.it/libjxl/libjxl/latest/
|
|
2024-11-23 11:23:08
|
yeah, debian build was broken...
|
|
|
DZgas Ж
|
2024-11-24 07:59:34
|
wow, no one new ffmpeg build is not working for me <:PepeGlasses:878298516965982308>
|
|
2024-11-24 08:02:06
|
ffmpeg 7.1 Are AVX hardcode now?
|
|
2024-11-24 08:06:25
|
ffmpeg-7.0.2 works. But none of the new 7.1 or GIT versions run on me
|
|
|
|
salrit
|
2024-11-25 08:33:44
|
Before calculating the 'Gather Tree Data' in the JXL code, there seems to be a step where only a fraction of pixels are chosen using some kind of distribution sampling... for Modular Encoding in lossless way. What does this step mean? and while gathering the tree data, this is seen that just the fraction of pixels' residuals are chosen for getting added to the tree_samples, why is it so?
|
|
2024-11-25 08:35:52
|
I am using higher efforts here btw...
|
|
|
_wb_
|
2024-11-26 02:18:48
|
mostly to save memory: the memory needed to store those samples for tree learning is proportional to the fraction, while usually even when sampling only 10% or so, the result is not hugely different from sampling 100%...
|
|
|
CrushedAsian255
|
|
_wb_
mostly to save memory: the memory needed to store those samples for tree learning is proportional to the fraction, while usually even when sampling only 10% or so, the result is not hugely different from sampling 100%...
|
|
2024-11-26 02:46:02
|
`-I` controls that correct?
|
|
|
_wb_
|
|
|
salrit
|
|
_wb_
mostly to save memory: the memory needed to store those samples for tree learning is proportional to the fraction, while usually even when sampling only 10% or so, the result is not hugely different from sampling 100%...
|
|
2024-11-26 04:54:40
|
Thanks! I got a bit confused—where exactly are the residuals computed? I noticed some kind of prediction errors being generated during the "Gather-Tree-Data" step, but where does the residue generation actually happen? From what I see, the steps are ComputeTree -> ComputeToken -> EncodeGroup, so it’s likely happening before EncodeGroup, but I couldn’t pinpoint the exact location in the code.
|
|
|
_wb_
|
2024-11-26 05:05:01
|
the residuals depend on the predictor which depends on the tree, so iirc first samples are gathered to create a tree, then the tree is used to do the actual encoding
|
|
2024-11-26 05:06:24
|
The residuals are computed here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/modular/encoding/enc_encoding.cc#L530
|
|
|
|
salrit
|
|
jonnyawsom3
|
2024-11-26 07:06:58
|
Hear me out, what if we used him to test gradients
|
|
|
|
salrit
|
|
_wb_
The residuals are computed here: https://github.com/libjxl/libjxl/blob/main/lib/jxl/modular/encoding/enc_encoding.cc#L530
|
|
2024-11-27 10:14:13
|
While setting the properties here for predictors :
https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_modular.cc#L1143
Suppose the prediction scheme is the kBest (as I have higher efforst), so the chosen predictors are Weighted and Gradient (https://github.com/libjxl/libjxl/blob/main/lib/jxl/modular/encoding/enc_ma.cc#L526),
When gathering data for the tree, during the process of iterating through the pixels in a section, either PredictLearnAll or PredictLearnAllNEC is called. Inside these functions, all the predictors are iterated over. Why is it that in the earlier stage, only two predictors are selected, but now all predictors are being considered?
|
|
|
_wb_
|
2024-11-27 11:20:31
|
I suppose there is some opportunity for further specialization to make a variant of that function that doesn't populate all predictors but just Weighted and Gradient. As it is now, we have a templated very generic `Predict` function that populates the various properties and produces either one prediction or all predictions (the latter only used for encoding), with some specialized instances mostly for faster decoding.
|
|
|
spider-mario
|
2024-12-01 12:58:39
|
> In what could be a wonderful holiday for the Linux desktop, it looks like the Wayland color management protocol might finally be close to merging after four years in discussion.
|
|
|
Quackdoc
|
|
spider-mario
> In what could be a wonderful holiday for the Linux desktop, it looks like the Wayland color management protocol might finally be close to merging after four years in discussion.
|
|
2024-12-01 01:00:51
|
it also looks pretty good, nothing blatantly wrong as far as I can tell. and it doesn't push dumb assumptions onto compositors either
|
|
|
DZgas Ж
|
2024-12-04 09:34:56
|
<@853026420792360980>
|
|
|
Traneptora
|
|
DZgas Ж
|
|
Traneptora
yes?
|
|
2024-12-04 09:38:03
|
the error occurs when converting a video to png. In my case, av1 video
|
|
2024-12-04 09:38:50
|
video sample
|
|
2024-12-04 09:39:27
|
```ffmpeg.exe -i "C:\Users\a\Desktop\THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].mkv" -vf scale=128:-1:flags=spline "C:\Users\a\Desktop\THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].png"```
|
|
2024-12-04 09:39:45
|
|
|
|
Traneptora
|
|
DZgas Ж
|
2024-12-04 09:41:14
|
yep
|
|
2024-12-04 09:41:24
|
You did it.
|
|
|
Traneptora
|
2024-12-04 09:42:16
|
Your original content is subsampled
|
|
2024-12-04 09:42:26
|
what did you expect to happen
|
|
2024-12-04 09:42:27
|
it's 4:2:0
|
|
|
DZgas Ж
|
|
Traneptora
Your original content is subsampled
|
|
2024-12-04 09:42:59
|
and what?
|
|
|
Traneptora
|
2024-12-04 09:43:07
|
and your original content is subsampled
|
|
2024-12-04 09:43:12
|
what did you expect to happen
|
|
|
DZgas Ж
|
2024-12-04 09:43:31
|
Am I making a yuv420 in png? What are you saying
|
|
|
Traneptora
|
2024-12-04 09:43:43
|
I'm saying you're starting with content that's 4:2:0 subsampled
|
|
2024-12-04 09:43:51
|
you shouldn't expect anything different to happen
|
|
2024-12-04 09:43:57
|
PNGs aren't subsampled
|
|
2024-12-04 09:44:07
|
but upscaling the chroma doesn't change anything
|
|
2024-12-04 09:44:14
|
you're going to have less data than you started with
|
|
2024-12-04 09:44:36
|
the only way to convert yuv420p into RGB is to upscale it to yuv444p first, and then convert that to RGB
|
|
2024-12-04 09:44:43
|
literally nothing to do with the PNG encoder
|
|
|
DZgas Ж
|
|
Traneptora
I'm saying you're starting with content that's 4:2:0 subsampled
|
|
2024-12-04 09:44:52
|
ffmpeg doesn't work the way you say it does. the size reduction occurs in the rgb field and not 420, then this field turns into RGB out png sample
|
|
|
Traneptora
|
2024-12-04 09:44:52
|
by the time the PNG encoder sees anything, it's already RGB
|
|
2024-12-04 09:45:03
|
that's definitely not true
|
|
2024-12-04 09:45:08
|
you actually have no clue how it works
|
|
2024-12-04 09:45:45
|
what's happening here is that 4:2:0 is being upsampled to 4:4:4, then to RGB, then it's being downscaled
|
|
2024-12-04 09:45:54
|
literally nothing to do with PNG encoder
|
|
|
DZgas Ж
|
|
Traneptora
literally nothing to do with PNG encoder
|
|
2024-12-04 09:46:43
|
I literally don't understand the meaning of your text. Ffmpeg is broken and gives the wrong file. What do you say?
|
|
|
Traneptora
|
2024-12-04 09:46:58
|
there's nothing broken and the file isn't wrong
|
|
2024-12-04 09:47:10
|
I'm using English, if you don't understand me I suggest you learn what these terms mean
|
|
2024-12-04 09:47:27
|
rather than just knee-jerk accuse the software of being broken because you don't understand how anything works
|
|
2024-12-04 09:47:40
|
You can't convert yuv420p into RGB without upscaling the chroma to 4:4:4 first
|
|
2024-12-04 09:47:44
|
because RGB cannot be subsampled
|
|
2024-12-04 09:48:30
|
since the PNG encoder doesn't declare any yuv formats as supported, FFmpeg automatically converts it from yuv444p into RGB using the negotation method that libavfilter provides
|
|
|
DZgas Ж
|
|
Traneptora
there's nothing broken and the file isn't wrong
|
|
2024-12-04 09:49:04
|
> file isn't wrong
wtf
|
|
|
Traneptora
|
2024-12-04 09:49:07
|
If you want more detail about what's going on under the hood, run ffmpeg with `-v debug`
|
|
|
DZgas Ж
> file isn't wrong
wtf
|
|
2024-12-04 09:49:27
|
it isn't wrong. You started with subsampled content. You ended with subsampled content. This should not surprise you.
|
|
2024-12-04 09:51:14
|
If you think libswscale is scaling it wrong, try using libplacebo
|
|
|
spider-mario
|
2024-12-04 09:51:41
|
is it using nearest-neighbour to upsample the chroma?
|
|
|
Traneptora
|
2024-12-04 09:52:43
|
most likely, yes. swscale isn't very sophisticated, but it's not incorrect
|
|
|
DZgas Ж
|
|
Traneptora
If you think libswscale is scaling it wrong, try using libplacebo
|
|
2024-12-04 09:52:58
|
ok, write a command to scale
|
|
|
Traneptora
|
|
DZgas Ж
ok, write a command to scale
|
|
2024-12-04 09:53:52
|
```
libplacebo=w=128:h=-1:format=gbrp
```
|
|
2024-12-04 09:54:00
|
plus everything else
|
|
2024-12-04 09:55:08
|
```
ffmpeg -y -i "THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].mkv" -vf "libplacebo=w=128:h=-1:format=gbrp,crop=128:64" -frames:v 1 -update 1 OUT2.png
```
|
|
2024-12-04 09:55:44
|
upscaler defaults to spline36, downscaler defaults to mitchell
|
|
2024-12-04 09:55:52
|
you can, of course, change it with -upscaler and -downscaler
|
|
|
DZgas Ж
|
|
Traneptora
it isn't wrong. You started with subsampled content. You ended with subsampled content. This should not surprise you.
|
|
2024-12-04 09:55:57
|
No. I still don't understand the meaning of your words. I'm doing a job and the job doesn't match what it should be. and you don't offer a solution -- on my part, are you saying that:
it's Broken? Not really. That's how it should be (broken)
|
|
2024-12-04 09:56:00
|
and blurry and same error
|
|
2024-12-04 09:56:06
|
libplacebo
|
|
|
Traneptora
|
2024-12-04 09:56:20
|
No, I'm saying that it's not broken
|
|
2024-12-04 09:56:26
|
You're starting with subsampled content
|
|
2024-12-04 09:56:29
|
you can't create data that isn't there
|
|
2024-12-04 09:56:36
|
no matter what you do it's going to look subsampled
|
|
|
DZgas Ж
|
|
Traneptora
No, I'm saying that it's not broken
|
|
2024-12-04 09:56:39
|
not ffmpeg is broken
|
|
|
Traneptora
|
2024-12-04 09:56:49
|
this is so fucking pointless
|
|
2024-12-04 09:57:00
|
you're too stubborn to read and too stubborn to learn what the terms mean
|
|
|
DZgas Ж
|
|
Traneptora
this is so fucking pointless
|
|
2024-12-04 09:57:26
|
you are not offering a solution. your command outputs the same 422
|
|
|
Traneptora
|
2024-12-04 09:57:35
|
it's not 4:2:2
|
|
2024-12-04 09:57:36
|
and no, it's not
|
|
2024-12-04 09:57:42
|
it looks qualitatively different
|
|
2024-12-04 09:57:47
|
are you looking at `OUT.png` again?
|
|
2024-12-04 09:58:01
|
because this writes to `OUT2.png`
|
|
|
spider-mario
|
2024-12-04 09:58:07
|
the input is quite high-resolution, though; it is a bit surprising that it ends up so low-quality even downsampled to this extent
|
|
|
DZgas Ж
|
|
Traneptora
it's not 4:2:2
|
|
2024-12-04 09:58:11
|
It's definitely 422
|
|
|
spider-mario
|
2024-12-04 09:58:18
|
almost as if it were subsampling the chroma again after downsizing
|
|
2024-12-04 09:58:45
|
maybe forcing a conversion to yuv444p10le before the scale filter would help?
|
|
|
Traneptora
|
|
spider-mario
maybe forcing a conversion to yuv444p10le before the scale filter would help?
|
|
2024-12-04 09:58:57
|
no, using libplacebo is enough to get what he wants
|
|
2024-12-04 09:59:00
|
idk why he's still whining
|
|
|
spider-mario
|
2024-12-04 09:59:34
|
ah, indeed it does
|
|
2024-12-04 09:59:43
|
|
|
|
DZgas Ж
|
|
Traneptora
idk why he's still whining
|
|
2024-12-04 09:59:57
|
the problem has not been solved. I'm not getting RGB. the software does not work
|
|
|
Traneptora
|
2024-12-04 10:00:04
|
You are saving it as a PNG
|
|
2024-12-04 10:00:06
|
you are getting RGB
|
|
2024-12-04 10:00:24
|
if you forcibly convert it back to yuv444p and do a reinterpret-cast to RGB so you can channel-decompose it, this is what you get
|
|
2024-12-04 10:00:28
|
U channel
|
|
2024-12-04 10:00:33
|
V channel
|
|
|
DZgas Ж
|
|
Traneptora
you are getting RGB
|
|
2024-12-04 10:00:33
|
this is the 422 that was converted to rgb
|
|
|
Traneptora
|
2024-12-04 10:00:42
|
Y channel
|
|
2024-12-04 10:00:46
|
there's no 4:2:2 anywhere here
|
|
|
DZgas Ж
|
2024-12-04 10:00:59
|
where this pic
|
|
|
Traneptora
|
2024-12-04 10:01:14
|
I generated it with
```
ffmpeg -y -i "THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].mkv" -vf "libplacebo=w=128:h=-1:format=gbrp,crop=128:64" -frames:v 1 -update 1 OUT2.png
```
|
|
2024-12-04 10:01:28
|
upscaled it with nearest neighbor
|
|
2024-12-04 10:02:13
|
I upscaled it to nearest neighbor using libplacebo, again
```
ffmpeg -y -i OUT2.png -vf libplacebo=w=1024:h=512:upscaler=nearest:format=gbrp test.png
```
|
|
2024-12-04 10:03:08
|
then used
```
ffmpeg -i test.png -vf libplacebo=colorspace=bt709:format=yuv444p -f rawvideo - | ffmpeg -f rawvideo -video_size 1024x512 -pixel_format gbrp -i - -y test2.png
```
to convert it back into yuv444p. The second ffmpeg command is to reinterpet it as to allow me to open it in GIMP, and decompose the channels.
|
|
|
DZgas Ж
|
|
Traneptora
I generated it with
```
ffmpeg -y -i "THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].mkv" -vf "libplacebo=w=128:h=-1:format=gbrp,crop=128:64" -frames:v 1 -update 1 OUT2.png
```
|
|
2024-12-04 10:04:33
|
Hm... ffmpeg is crahed
|
|
|
Traneptora
|
2024-12-04 10:05:23
|
do you have vulkan?
|
|
2024-12-04 10:05:36
|
if you don't, you can always use zscale instead as well
|
|
|
spider-mario
|
2024-12-04 10:05:44
|
if using `-vf scale` instead of libplacebo, it does seem to help to insert a forced conversion to yuv444p12 first
|
|
2024-12-04 10:06:11
|
|
|
|
Traneptora
|
2024-12-04 10:06:13
|
```
ffmpeg -y -i "THE AMAZING DIGITAL CIRCUS: PILOT [HwAPLk_sQ3w].mkv" -vf "zscale=w=128:h=-1:f=spline36,format=gbrp,crop=128:64" -frames:v 1 -update 1 OUT2.png
```
|
|
2024-12-04 10:06:20
|
you can use zscale as well to make it use zimg
|
|
|
DZgas Ж
|
|
Traneptora
do you have vulkan?
|
|
2024-12-04 10:06:43
|
Actually, yes. but it doesn't work
|
|
|
Traneptora
|
2024-12-04 10:07:00
|
if you have a broken vulkan setup then use zscale instead
|
|
2024-12-04 10:07:02
|
which is all cpu
|
|
|
DZgas Ж
|
2024-12-04 10:07:50
|
zscale works good
|
|
2024-12-04 10:08:09
|
standart scale do 422
|
|
|
Traneptora
|
2024-12-04 10:08:19
|
there's no 4:2:2 anywhere, and I keep trying to tell you this
|
|
2024-12-04 10:08:32
|
swscale is not sophisticated. it's upsampling 4:2:0 to 4:4:4 before it converts to RGB, but it does so using nearest-neighbor
|
|
2024-12-04 10:08:53
|
4:2:2 isn't anywhere in the pipeline at all at any point
|
|
|
DZgas Ж
|
|
Traneptora
there's no 4:2:2 anywhere, and I keep trying to tell you this
|
|
2024-12-04 10:09:09
|
you can't convert a yuv420 video to rgb24 and say it's rgb. It's still a bullshit yuv420
|
|
|
Traneptora
|
2024-12-04 10:09:39
|
in terms of the data available, yes
|
|
|
DZgas Ж
|
2024-12-04 10:10:02
|
<:This:805404376658739230>
|
|
|
Traneptora
|
2024-12-04 10:10:12
|
I did say that
|
|
|
DZgas Ж
|
2024-12-04 10:10:45
|
I have never used alternative scaling methods. now I will know about zscale
|
|
2024-12-04 10:11:03
|
At least it's working.
|
|
|
Traneptora
|
2024-12-04 10:11:22
|
libswscale is correct, just not a high quality scaler
|
|
2024-12-04 10:11:26
|
which is what you were experiencing
|
|
2024-12-04 10:11:28
|
nothing was broken
|
|
|
DZgas Ж
|
|
Traneptora
libswscale is correct, just not a high quality scaler
|
|
2024-12-04 10:13:14
|
Yes, this is complete nonsense.
png to png is perfectly converted
and video to JPEG as yuvj444p in my code is also perfectly converted, so just ffmpeg Standard scale lib is broken, since the scale replacement fixed it
|
|
|
Traneptora
nothing was broken
|
|
2024-12-04 10:13:23
|
Actually
|
|
|
Traneptora
|
2024-12-04 10:13:33
|
you know what, I don't care anymore
|
|
2024-12-04 10:13:49
|
this is like talking to a brick wall
|
|
|
DZgas Ж
|
|
Traneptora
this is like talking to a brick wall
|
|
2024-12-04 10:14:40
|
Of course, because I'm right and you're wrong <:PirateCat:992960743169347644>
|
|
2024-12-04 10:15:45
|
and the answer is literally: libswscale is broken so use zscale
|
|
|
A homosapien
|
2024-12-04 10:21:27
|
libswscale isn't broken, your just not interpolating the chroma, you have to pass `-sws_flags +accurate_rnd+full_chroma_int`
|
|
2024-12-04 10:21:42
|
I *do* think its dumb that it's not turned on by default
|
|
|
DZgas Ж
|
|
A homosapien
libswscale isn't broken, your just not interpolating the chroma, you have to pass `-sws_flags +accurate_rnd+full_chroma_int`
|
|
2024-12-04 10:23:47
|
THIS SHIT
|
|
2024-12-04 10:23:55
|
<:This:805404376658739230>
|
|
2024-12-04 10:24:15
|
<@853026420792360980>this
|
|
2024-12-04 10:24:35
|
And there are no problems now
|
|
|
A homosapien
libswscale isn't broken, your just not interpolating the chroma, you have to pass `-sws_flags +accurate_rnd+full_chroma_int`
|
|
2024-12-04 10:26:10
|
at that moment, I did not understand the meaning of this function, png to png scale gave identical results, but it turns out that's what it is
|
|
|
spider-mario
|
2024-12-04 10:28:23
|
what does “full” chroma interpolation mean?
|
|
2024-12-04 10:28:29
|
what happens when it’s not full?
|
|
|
DZgas Ж
|
2024-12-04 10:29:00
|
flags=spline+full_chroma_inp+accurate_rnd+full_chroma_int 👍 gread
|
|
|
spider-mario
|
2024-12-04 10:29:50
|
oh, it seems to be the equivalent of my “pre-converting to 4:4:4”?
|
|
|
DZgas Ж
|
|
spider-mario
what happens when it’s not full?
|
|
2024-12-04 10:29:56
|
as can be seen from our argument above. when converting a video frame to PNG, you actually get YUV422 output write as RGB
|
|
|
DZgas Ж
It's definitely 422
|
|
2024-12-04 10:30:41
|
above
|
|
|
A homosapien
I *do* think its dumb that it's not turned on by default
|
|
2024-12-04 10:32:43
|
I agree
|
|
|
A homosapien
|
|
spider-mario
oh, it seems to be the equivalent of my “pre-converting to 4:4:4”?
|
|
2024-12-04 11:32:29
|
It upscales the 420 or 422 up to 444. I encountered this while I was making a wiki extracting frames from videos. I was confused why the chroma looked really blocky.
I found the solution in this cool beginners guide for converting YUV to RGB and vice versa.
https://trac.ffmpeg.org/wiki/colorspace
|
|
2024-12-04 11:38:59
|
Nowadays I use libplacebo because I want gamma correct scaling filters.
|
|
|
jonnyawsom3
|
2024-12-04 11:50:00
|
So... If I'm using `-vf scale=480:-2` to downscale, I should add `:flags=full_chroma_int`?
|
|
|
A homosapien
|
2024-12-04 11:58:27
|
Always, I recommend `spline+full_chroma_int+accurate_rnd`
|
|
2024-12-04 11:58:49
|
The default settings are tuned for absolute speed rather than accuracy
|
|
|
Traneptora
|
2024-12-04 11:59:07
|
+bitexact for cross platform reproducability
|
|
|
A homosapien
|
2024-12-05 12:01:08
|
Yeah, if you want to compare file hashes between outputs it's `+bitexact -map_metadata -1 -map_chapters -1`
|
|
2024-12-05 12:01:25
|
I think
|
|
|
Traneptora
|
2024-12-05 12:01:32
|
nah for that use -c rawvideo -f hash
|
|
2024-12-05 12:01:49
|
or -f framecrc
|
|
|
A homosapien
|
2024-12-05 12:02:35
|
oh nice, there are so many cool shortcuts with ffmpeg
|
|
2024-12-05 12:03:17
|
the learning curve is hard but fun
|
|
|
So... If I'm using `-vf scale=480:-2` to downscale, I should add `:flags=full_chroma_int`?
|
|
2024-12-05 12:04:07
|
Here is a cool guide
https://academysoftwarefoundation.github.io/EncodingGuidelines/EncodeSwsScale.html#sws_flags-options
|
|
|
jonnyawsom3
|
|
A homosapien
Always, I recommend `spline+full_chroma_int+accurate_rnd`
|
|
2024-12-05 12:04:52
|
Says there that accurate_rnd doesn't do much anymore since 5.0, and I've generally been using Lanczos
|
|
|
A homosapien
|
2024-12-05 12:08:36
|
Yeah the main one is `+full_chroma_int`, always good to have that one
|
|
|
jonnyawsom3
|
2024-12-05 12:09:09
|
Also, after spending a few hours making a script for a friend, I found out `:force_original_aspect_ratio=decrease:force_divisible_by=2` exists, which saved me from some ChatGPT nightmare fuel (Trying to scale to a max of 1280, but only by multiples of 2 to maintain pixel art)
|
|
2024-12-05 12:09:43
|
Still doesn't do the multiple of 2 thing, but looked good enough :P
|
|
|
A homosapien
|
|
jonnyawsom3
|
2024-12-05 12:48:33
|
Just spent 20 minutes trying to figure out why my files were identical with and without it.... I had typed inp not int
|
|
|
Traneptora
|
|
Also, after spending a few hours making a script for a friend, I found out `:force_original_aspect_ratio=decrease:force_divisible_by=2` exists, which saved me from some ChatGPT nightmare fuel (Trying to scale to a max of 1280, but only by multiples of 2 to maintain pixel art)
|
|
2024-12-05 12:50:37
|
what do you mean by maintain pixel art?
|
|
|
jonnyawsom3
|
2024-12-05 12:50:59
|
Stay in 2, 4, 8x scaling so not to stretch the pixels
|
|
|
Traneptora
|
2024-12-05 12:51:04
|
`-vf libplacebo=upscaler=nearest` is good for that
|
|
2024-12-05 12:51:12
|
just forces nearest neighbor upsampling
|
|
2024-12-05 12:51:34
|
works as well if it's not a power of 2
|
|
2024-12-05 12:51:42
|
you could do 64x64 -> 384x384 for ex
|
|
|
jonnyawsom3
|
2024-12-05 12:52:05
|
Yeah, I have it set to nearest, but wasn't sure if having non-multiple scaling would still skew it at all. Turned out alright though
|
|
2024-12-05 12:52:13
|
Can't think of the right wording....
|
|
|
Traneptora
|
2024-12-05 12:52:23
|
afaiu the scale factor has to be an integer but not a power of 2
|
|
2024-12-05 12:52:39
|
so like, if you upscale by 3, each pixel becomes a 3x3 rectangle
|
|
2024-12-05 12:52:49
|
but otherwise you get to keep hard edges
|
|
|
jonnyawsom3
|
2024-12-05 12:53:07
|
A multiple of the original dimensions, but without going over 1280. Which just isn't possible in a single command as far as I can tell, so `force_original_aspect` is as good as I could get
|
|
|
Traneptora
|
2024-12-05 12:53:26
|
you can use an expression for that iirc
|
|
2024-12-05 12:55:04
|
for example, uh
|
|
|
A homosapien
|
2024-12-05 12:55:04
|
like `libplaebo=w=iw*2:h=ih*2`?
|
|
|
Traneptora
|
2024-12-05 12:55:06
|
```
ffmpeg -i rose.png -vf 'libplacebo=w=floor(1280/iw)*iw:h=-1:upscaler=nearest' foo.png
```
|
|
2024-12-05 12:55:24
|
this resamples rose.png from 70x46 into 1260x828
|
|
2024-12-05 12:55:40
|
which is a scale factor of 18
|
|
2024-12-05 12:56:14
|
this is what you get
|
|
2024-12-05 12:56:23
|
(open in browser for max res)
|
|
|
jonnyawsom3
|
2024-12-05 12:56:39
|
Main issue I had was having it handle both portrait and landscape
|
|
|
Traneptora
|
2024-12-05 12:57:18
|
`force_original_aspect_ratio=decrease` works as well
|
|
2024-12-05 12:57:56
|
e.g. ```
ffmpeg -i rose.png -vf 'libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:upscaler=nearest' foo.png
```
|
|
2024-12-05 12:58:11
|
you also have `force_divisible_by` as well
|
|
2024-12-05 12:58:13
|
if you need that
|
|
2024-12-05 12:58:36
|
run `ffmpeg -h filter=libplacebo` for a full list of supported filter options
|
|
2024-12-05 12:59:09
|
for example, libplacebo can crop, but by default it doesn't
|
|
|
jonnyawsom3
|
|
Traneptora
e.g. ```
ffmpeg -i rose.png -vf 'libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:upscaler=nearest' foo.png
```
|
|
2024-12-05 01:05:28
|
That's doing... Something, but it seems to just give up
```ffmpeg -hide_banner -i Test.png -vf "libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:upscaler=nearest" -y Telegram.png
Input #0, png_pipe, from 'Test.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, pal8(pc, gbr/unknown/unknown), 190x190 [SAR 2834:2834 DAR 1:1], 25 fps, 25 tbr, 25 tbn
```
|
|
|
Traneptora
|
|
That's doing... Something, but it seems to just give up
```ffmpeg -hide_banner -i Test.png -vf "libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:upscaler=nearest" -y Telegram.png
Input #0, png_pipe, from 'Test.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, pal8(pc, gbr/unknown/unknown), 190x190 [SAR 2834:2834 DAR 1:1], 25 fps, 25 tbr, 25 tbn
```
|
|
2024-12-05 01:06:07
|
wdym "give up"
|
|
|
jonnyawsom3
|
2024-12-05 01:06:15
|
As in that's the entire output
|
|
|
Traneptora
|
2024-12-05 01:06:21
|
that's not right
|
|
|
As in that's the entire output
|
|
2024-12-05 01:07:10
|
you sure? what happens if you instead run ffmpeg without `-hide_banner` and run it with `-v debug`
|
|
2024-12-05 01:07:52
|
you should see a bunch of crap
|
|
|
jonnyawsom3
|
|
Traneptora
|
2024-12-05 01:08:39
|
looks like it's crashing upon creating the vulkan instance
|
|
2024-12-05 01:11:03
|
I messed haasn about it
|
|
|
jonnyawsom3
|
2024-12-05 01:11:19
|
Thanks
|
|
|
Traneptora
|
|
|
|
2024-12-05 01:13:41
|
theoretically you can use:
```
ffmpeg -i rose.png -vf "scale=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:flags=neighbor+bitexact+accurate_rnd" foo.png
```
|
|
2024-12-05 01:14:13
|
at least until the libplacebo wrapper bug gets fixed
|
|
|
A homosapien
|
|
That's doing... Something, but it seems to just give up
```ffmpeg -hide_banner -i Test.png -vf "libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:upscaler=nearest" -y Telegram.png
Input #0, png_pipe, from 'Test.png':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: png, pal8(pc, gbr/unknown/unknown), 190x190 [SAR 2834:2834 DAR 1:1], 25 fps, 25 tbr, 25 tbn
```
|
|
2024-12-05 01:16:59
|
Add `-hwaccel vulkan`
|
|
2024-12-05 01:17:16
|
I ran into the exact same issue
|
|
|
jonnyawsom3
|
2024-12-05 01:20:13
|
Huh... That's not right
|
|
|
A homosapien
|
2024-12-05 01:25:54
|
Ohh
|
|
2024-12-05 01:26:00
|
Turn off dithering
|
|
2024-12-05 01:26:10
|
Blue noise on by default
|
|
2024-12-05 01:26:50
|
https://gist.github.com/nico-lab/0825b680ff48cad1699edb095daf8cbd
|
|
|
jonnyawsom3
|
2024-12-05 01:27:00
|
Didn't know it had blue noise ~~my beloved~~, but still not right
|
|
2024-12-05 01:40:54
|
Works fine with mp4 output so good enough for me
|
|
|
Traneptora
|
|
Didn't know it had blue noise ~~my beloved~~, but still not right
|
|
2024-12-05 02:07:24
|
|
|
2024-12-05 02:07:52
|
what problem are you experiencing?
|
|
2024-12-05 02:08:07
|
```
ffmpeg -i image.png -vf "libplacebo=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:upscaler=nearest:dithering=-1:format=gbrp" image2.png
```
|
|
2024-12-05 02:08:09
|
this is waht I ran
|
|
2024-12-05 02:08:19
|
it didn't make everything blue and oversaturated
|
|
|
jonnyawsom3
|
2024-12-05 02:09:56
|
I didn't have the format set, so I think it was re-dithering/pallete-ing to PAL8
|
|
|
Traneptora
|
2024-12-05 02:24:31
|
dithering won't happen by default on upscale anyway
|
|
2024-12-05 02:24:51
|
it's a thing that happens with bit depth reduction
|
|
|
jonnyawsom3
|
|
Didn't know it had blue noise ~~my beloved~~, but still not right
|
|
2024-12-05 02:57:28
|
Huh... Wonder what happened there then
`ffmpeg -hide_banner -i Test.png -vf "scale=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:flags=neighbor" -y Telegram.png`
|
|
|
A homosapien
|
2024-12-05 03:02:12
|
idk what happened, I didn't get the color shift you're getting
|
|
|
jonnyawsom3
|
2024-12-05 05:29:58
|
Maybe a bug since mine was a Git build? Gimme a sec...
|
|
|
A homosapien
Add `-hwaccel vulkan`
|
|
2024-12-05 05:42:34
|
Didn't work by the way
|
|
2024-12-05 05:42:54
|
But nope, either it's dithered or it's having the colors messed up when outputting to PNG
|
|
2024-12-05 05:43:08
|
`ffmpeg -hide_banner -i Test.png -vf "scale=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:flags=neighbor:sws_dither=0" -y Telegram.png`
|
|
|
A homosapien
|
|
`ffmpeg -hide_banner -i Test.png -vf "scale=w=floor(1280/iw)*iw:h=floor(1280/ih)*ih:force_original_aspect_ratio=decrease:force_divisible_by=2:flags=neighbor:sws_dither=0" -y Telegram.png`
|
|
2024-12-05 07:18:52
|
~~I tried this exact command and this is what I got, your ffmpeg is cursed ngl~~
|
|
2024-12-05 07:20:45
|
wait the input png was rgb24 instead of 8 bit paletted
|
|
2024-12-05 07:20:52
|
With an 8-bit palette input I got your result
|
|
2024-12-05 08:24:57
|
adding `-pix_fmt rgb24` fixes the issue
|
|
|
jonnyawsom3
|
2024-12-05 06:28:52
|
Another day, another command addition needed
Turns out the input my friend has is a PNG sequence with variable sizes. Pretty sure I'd need to load all the files, find the largest, and then insert the frames into the larger canvas
|
|
2024-12-05 06:30:21
|
This is around 14,000 PNGs, as a timelapse from Asperite, which is then scaled 4x from a few hundred pixels wide and with a tan background added
|
|
2024-12-05 06:33:32
|
```bat
@echo off
set /p "Filename=Enter Frame Name (If the frames are called Dino1395 input Dino, etc): "
ffmpeg -hide_banner -f lavfi -i anullsrc -f image2 -framerate 60 -i "%~1\%Filename%%%d.png" -vf "format=rgba,split[bg][fg];[bg]drawbox=c=tan:t=fill[bgc];[bgc][fg]overlay,scale=in_w*4:-2:flags=neighbor" -map 0:a -map 1:v -shortest -y Timelapse.mp4
pause```
|
|
2024-12-05 06:33:58
|
The filename thing is because the PNGs are a sequence, but with the project name beforehand
|
|
2024-12-05 06:36:11
|
Easier to just copy paste the name and backspace a few than write a whole other section to the bat file
|
|
|
ProfPootis
|
|
jonnyawsom3
|
2024-12-05 10:30:48
|
So... The main issue I'm having is that the scale filter doesn't run 'per frame', it's only using the first frame and then forcing the rest to fit that, rather than decreasing the resolution to maintain their aspect ratio in the chosen dimensions
|
|
2024-12-05 10:31:01
|
If that makes any sense... I'm going sligtly insane on this
|
|
|
bonnibel
|
2024-12-11 08:32:51
|
|
|
2024-12-11 08:32:52
|
some random thoughts after a few days
i'm downscaling by an integer factor, so i can easily combine the blurring and area averaging into a single step
this gets you different weights than just using the gaussian as the resize kernel. not sure which is more correct?
as for picking sigma, what i'm currently doing is solving 1/(2\*d) = sqrt(2\*ln(c)) * 1/(2\*pi\*sigma), where c = sqrt(2) and d = the downscale factor (e.g. 2 for a 2x downscale). if i've understood it correctly, this'd put the -3 dB cutoff frequency at the nyquist frequency of the downscaled image. given how little i know about dsp though i've probably _not_ understood it correctly
|
|
|
_wb_
|
2024-12-12 04:31:00
|
Combining two filters in a single step requires using a larger kernel size, so I am not sure it is actually faster...
|
|
|
Traneptora
|
|
So... The main issue I'm having is that the scale filter doesn't run 'per frame', it's only using the first frame and then forcing the rest to fit that, rather than decreasing the resolution to maintain their aspect ratio in the chosen dimensions
|
|
2024-12-14 09:56:28
|
there a setting for that
|
|
2024-12-14 09:56:36
|
lemme find it, sec
|
|
|
So... The main issue I'm having is that the scale filter doesn't run 'per frame', it's only using the first frame and then forcing the rest to fit that, rather than decreasing the resolution to maintain their aspect ratio in the chosen dimensions
|
|
2024-12-14 10:10:13
|
`eval=frame`
|
|
2024-12-14 10:10:38
|
> eval
> Specify when to evaluate width and height expression. It accepts the following values:
>
> init
> Only evaluate expressions once during the filter initialization or when a command is processed.
>
> frame
> Evaluate expressions for each incoming frame.
>
> Default value is init.
|
|
|
bonnibel
|
|
_wb_
Combining two filters in a single step requires using a larger kernel size, so I am not sure it is actually faster...
|
|
2024-12-15 12:18:35
|
true...
|
|
2024-12-15 12:20:55
|
> this gets you different weights than just using the gaussian as the resize kernel. not sure which is more correct?
okay, for integer downscaling doing gauss downscale directly rather than gauss blur + area downscale seems to my eyes more like the result i get doing it in the frequency domain, but it's hard to tell
|
|
|
damian101
|
2024-12-15 12:43:28
|
what set of images was ssimulacra2 trained on?
|
|
|
bonnibel
|
2024-12-15 02:07:43
|
afaik: CID22, TID2013, KADID-10k, & KonFiG
|
|
|
_wb_
|
2024-12-15 04:01:00
|
Yes, I tuned it using those datasets, optimizing for Kendall rank correlation for all of them and also for Pearson and just MSE for the CID22 set, so the scores end up on a scale that is similar to the scale of CID22.
|
|
|
spider-mario
|
2024-12-21 09:44:37
|
https://mastodon.online/@nikitonsky/113691789641950263
|
|
|
Quackdoc
|
|
CrushedAsian255
|
2024-12-21 11:10:44
|
My projects: 0.3.48432
|
|
|
jonnyawsom3
|
2024-12-22 02:51:28
|
Also <@384009621519597581>, yesterday I was thinking about Space Engine again and creating higher bitdepth images. Exposure bracketing came to mind so I was going to have my friend render a few images, but then I realised I don't know the ideal settings for the widest range or the best software to process them into a wider single image
|
|
|
AccessViolation_
|
2024-12-22 02:58:59
|
I can boot it up and look at some values if you want. If you limit yourself to stuff within the solar system you can get a feel for the different brightness of the planets. If you get a couple of planets to line up, go pretty far away and use a very narrow field of view you can probably get a couple of similarly-sized planets and the sun in a single image
|
|
2024-12-22 02:59:32
|
Wait no if you had the sun in the picture you would only be seeing the dark sides
|
|
2024-12-22 03:00:29
|
Unless you like, have the sun take up half of the screen, then you could make it work
|
|
|
jonnyawsom3
|
2024-12-22 03:00:36
|
I mean wider as in range of values, since you wanted to make a 'real' JXL file going from the brightness of the sun to ambient space
|
|
|
AccessViolation_
|
2024-12-22 03:02:08
|
Yeah I know what you meant, that's probably going to require some custom software or at least something that can manually edit the layers in a JXL and then do the ICC profile correctly
|
|
2024-12-22 03:02:42
|
I was just thinking about how you would actually set this up in space engine since everything is realistically scaled and space objects aren't usually conveniently close
|
|
|
jonnyawsom3
|
2024-12-22 03:02:52
|
And now I'm wondering if we could do exposure bracketing *inside* a JXL using the blend modes and cutoff points for brightness per image as a layer. But that's probably insane
|
|
|
AccessViolation_
|
2024-12-22 03:06:23
|
There are several approaches, one is similar to a gain map approach with layers, another one is to store just one image with f32 channels, give it a linear profile, and encode the real values directly into floats, but then you end up with not a lot of precision on values very far away from zero. So the ideal transfer function for equal precision at every magnitude of brightness is to have a nonlinear transform function that compensates for the loss of precision in floats as the values get bigger, effectively leaving you with linear precision across the float range
|
|
2024-12-22 03:07:52
|
There was another idea around floats somehow getting linear precision someone had, but I forgot what it was :/
|
|
|
jonnyawsom3
|
|
AccessViolation_
I was just thinking about how you would actually set this up in space engine since everything is realistically scaled and space objects aren't usually conveniently close
|
|
2024-12-22 03:08:58
|
Planet foreground and sun background with the other stars in the space between would work as a proof of concept. This is more so about light values than the details of objects, until it's scaled to a might higher resolution once we know it works
|
|
2024-12-22 03:09:01
|
Given the game can only output 8bit images (Excluding BC6H with the Pro DLC), the gain map approach would likely be the least effort. Unless we have an easy way to blend such a wide range of exposures together, since I doubt normal exposure bracketing tools go above 16bit
|
|
|
AccessViolation_
|
2024-12-22 03:10:43
|
Yeah, that sounds like the best approach for starters
|
|
|
CrushedAsian255
|
|
AccessViolation_
There are several approaches, one is similar to a gain map approach with layers, another one is to store just one image with f32 channels, give it a linear profile, and encode the real values directly into floats, but then you end up with not a lot of precision on values very far away from zero. So the ideal transfer function for equal precision at every magnitude of brightness is to have a nonlinear transform function that compensates for the loss of precision in floats as the values get bigger, effectively leaving you with linear precision across the float range
|
|
2024-12-22 03:12:55
|
The loss of precision should matter as human vision is non linear anyways
|
|
2024-12-22 03:13:08
|
Or you could use a logarithmic transfer function
|
|
|
AccessViolation_
|
2024-12-22 03:15:08
|
The idea was not for humans to directly look at these images as a whole (although you could), the idea was more that you had sufficient detail in every brightness of the image so that when editing it you could choose to properly expose the surface of the sun, or the dark side of the earth, or both, and have them look good to human eyes after that edit
|
|
|
CrushedAsian255
|
2024-12-22 03:15:51
|
If doing that then the lack of precision on higher float values should still not matter as exposure is also exponential
|
|
|
jonnyawsom3
|
2024-12-22 03:15:58
|
Oh
> You can export 32-bit depth floating point skyboxes
Apparently that's with the PRO DLC's DDS export, I'd guess
<https://steamcommunity.com/app/314650/discussions/0/2263564102376823320/>
|
|
|
CrushedAsian255
|
2024-12-22 03:20:46
|
Float gives you approx +/- 126 Exposure steps
|
|
|
AccessViolation_
|
2024-12-22 03:21:04
|
That looks promising, though I wonder if that's still over the whole exposure range the game can produce or if it just picks a range around your target exposure and uses the float range for that. Worth exploring, but I don't have the PRO version, and also whether a given version of the game works on my steam deck is a bit of a gamble lol, it's officially "unsupported" but sometimes it works and sometimes it doesn't
|
|
|
CrushedAsian255
|
2024-12-22 03:23:47
|
You get maybe 10-20 more EV in the negative direction but that is for sub normals and you DO start losing precision
|
|
|
jonnyawsom3
|
|
AccessViolation_
That looks promising, though I wonder if that's still over the whole exposure range the game can produce or if it just picks a range around your target exposure and uses the float range for that. Worth exploring, but I don't have the PRO version, and also whether a given version of the game works on my steam deck is a bit of a gamble lol, it's officially "unsupported" but sometimes it works and sometimes it doesn't
|
|
2024-12-22 03:24:43
|
DDS only supports half float, so I'm not sure where he got 32-bit from...
|
|
|
CrushedAsian255
|
2024-12-22 03:25:32
|
If it can fit in half float it can easily fit in a full float
|
|
|
_wb_
|
2024-12-22 06:06:14
|
Float has the same amount of mantissa precision for every exponent so I think it's pretty good for this use case.
|
|
|
AccessViolation_
|
2024-12-22 06:29:07
|
That's true, but as the exponent gets larger, the amount added when 'incrementing' the mantissa increases. At a value of about 13 million, ticking up the mantissa adds 1 to the value and you can no longer really store decimals. So you are wasting a lot on the 0.00000001 and 0.00000002 you can distinguish. I guess "as the magnitude get larger, precision will be less important" was the idea for IEEE 754 and that's probably fine in most cases but here the idea is that you can resolve equal detail at very high or low absolute brightness, so that principle should explicitly not apply
|
|
2024-12-22 06:31:55
|
Here's a neat tool for seeing how values are represented as floats and the error of the requested vs represented value
https://www.h-schmidt.net/FloatConverter/IEEE754.html
|
|
2024-12-22 06:33:25
|
But I mean, given the still quite large amount of integers you can store precisely in f32, that's always an option if my concerns are valid (I know less about cameras, light and pictures than I do about floating point so maybe these concerns don't mean anything 😅 )
|
|
2024-12-22 06:34:13
|
And of course there's log profiles and what not that could compensate for it
|
|
|
spider-mario
|
2024-12-22 06:59:27
|
I’m not sure I follow – isn’t it mainly relative differences that matter at mid to high levels?
|
|
2024-12-22 07:01:01
|
|
|
|
jonnyawsom3
|
2024-12-22 07:03:41
|
Saying "No artifacts visible" is very subjective, especially when we'd be zooming in on distant stars and then sliding the exposure of the image after to make them bright
|
|
|
AccessViolation_
But I mean, given the still quite large amount of integers you can store precisely in f32, that's always an option if my concerns are valid (I know less about cameras, light and pictures than I do about floating point so maybe these concerns don't mean anything 😅 )
|
|
2024-12-22 07:05:54
|
Though, considering real camera sensor noise, and the in-game rendering not being perfectly accurate itself, it would probably be good enough
|
|
|
spider-mario
|
2024-12-22 07:23:34
|
but brightening them will involve multiplying anyway, not adding, right?
|
|
|
jonnyawsom3
|
2024-12-22 08:03:43
|
My brain hurts xD
|
|
|
_wb_
|
2024-12-22 09:19:07
|
If you consider brightness differences multiplicative (like when talking about stops) and store them in floats then you roughly have the same precision at every point, whether you are talking deep space nanonits (or whatever it is) or supernova petanits (or whatever it is)
|
|
|
AccessViolation_
|
2024-12-22 11:32:27
|
All this reading about predictors and context modeling makes me want to apply that concept to other data. One thing that comes to mind is Minecraft voxel world map data.
Oak log blocks will almost always continue in the direction they face (as part of trees or wooden beams in buildings). Grass will almost always have air on top, as many other things on top will change the grass to dirt. Stairs will usually have more stairs in the same direction one up and ahead, to continue the staircase. A water block with air above it will have a very high chance of repeating horizontally in all directions as part of the water surface, and if the biome is ocean the water probably extends a long way down too.
I know Minecraft internally uses a palette approach where each chunk creates a palette of all the blocks in it, and also a palette for some properties like whether blocks are waterlogged, and then the actual blocks are stored as indices into those. It uses the same amount of bits for every entry: the lowest amount you can get away with given the amount of blocks you have. And then it's deflate or lz4 compressed. I wonder how easy that is to beat with predictors and context modeling
|
|
2024-12-22 11:33:45
|
I mean, the best 'predictor' is just the chunk generation algorithm because world gen is procedural, but that's *very* CPU intensive which is probably why chunks are stored as a whole instead of only storing the changes to them
|
|
|
jonnyawsom3
|
2024-12-22 11:57:06
|
Maybe a cut down version of chunk gen for the 'higher level' properties like biome, to know what the most common blocks are
|
|
|
CrushedAsian255
|
2024-12-23 02:03:28
|
distance 4000 is amazing
<@207980494892040194>
|
|
|
|
embed
|
|
CrushedAsian255
distance 4000 is amazing
<@207980494892040194>
|
|
2024-12-23 02:03:34
|
https://embed.moe/https://cdn.discordapp.com/attachments/794206087879852106/1320572456607547443/out.jxl?ex=676a1670&is=6768c4f0&hm=0e0379c82a3a5ea692f19dde0ec1cb03fb8431a07e25883da4c86d00938d9b69&
|
|
|
CrushedAsian255
|
2024-12-23 02:05:19
|
```--intensity_target 0.1 -m 1 -d 40```
|
|
|
|
embed
|
|
CrushedAsian255
```--intensity_target 0.1 -m 1 -d 40```
|
|
2024-12-23 02:05:23
|
https://embed.moe/https://cdn.discordapp.com/attachments/794206087879852106/1320572919084093491/out.jxl?ex=676a16de&is=6768c55e&hm=a305f3c6b0d4889a67ceee04874e6d9d753ac0e77b03b72c138a3dd1ac9c095d&
|
|