JPEG XL

Info

rules 57
github 35276
reddit 647

JPEG XL

tools 4225
website 1655
adoption 20712
image-compression-forum 0

General chat

welcome 3810
introduce-yourself 291
color 1414
photography 3435
other-codecs 23765
on-topic 24923
off-topic 22701

Voice Channels

General 2147

Archived

bot-spam 4380

benchmarks

HCrikki
2024-02-28 03:19:32
is everyone running against their own sample images or is there a specific source dataset commonly benched ?
_wb_
2024-02-28 03:20:27
It might be a good idea to change the default effort setting to 6. For lossy, e6 is performing about as well as e7 while it is twice as fast, and for lossless e6 does perform worse than e7 but it still is much better than png/webp in all ways and has a more reasonable speed than e7 (even though e7 has become a lot more reasonable than it was, already)
HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 03:21:30
there are some commonly used sets, but it's always interesting to see how things are for the images you care about โ€” things can be quite different depending on the kind of image...
Quackdoc
HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 03:35:05
I don't use datasets since I find they poorly represent real world usecase, I will just crawl the internet and download a wack load of images i can get the source for, and rip a good amount of images from gallery sites
afed
_wb_ It might be a good idea to change the default effort setting to 6. For lossy, e6 is performing about as well as e7 while it is twice as fast, and for lossless e6 does perform worse than e7 but it still is much better than png/webp in all ways and has a more reasonable speed than e7 (even though e7 has become a lot more reasonable than it was, already)
2024-02-28 03:43:36
for lossy e6 with streaming by default and e7 without or now it's for the same efforts as for lossless? `-e 6` by default is good and bad, bad because a lot of people are comparing default settings without taking speed into account and e7 is still fast enough, with streaming also for lossless
2024-02-28 03:52:43
or now lossless e7 is worse than e6 on most images? because patches are disabled?
_wb_
2024-02-28 04:04:28
no, for lossless, e7 compresses better than e6, but it's a bit slow still (even though it's much faster now than it was).
2024-02-28 04:05:28
for lossy, it's not super clear to me how much better the compression of e7 is compared to e6, but I think the answer is "not much".
MSLP
Traneptora -October being the gcc optimization level of "ctober"
2024-02-28 04:26:55
doesn't work for me, but it's expected since it's February ๐Ÿคช
Oleksii Matiash
HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 06:12:33
I use my own photos because of their large size and the fact that it is my main jxl usage
afed
_wb_ no, for lossless, e7 compresses better than e6, but it's a bit slow still (even though it's much faster now than it was).
2024-02-28 06:29:38
ah, I just read it wrong then maybe, though patches can sometimes significantly improve compression, don't know how well it works for lossy, also maybe the latest optimizations for complex images can perform worse on fast efforts
fab
2024-02-28 09:12:05
Jon How JPEG XL can upscale good in the Rance higher than SPMT19730M colours
2024-02-28 09:12:17
Is fast if i load a video
2024-02-28 09:12:43
Basically what is it the Changelogs?
2024-02-28 09:13:23
I rewrote full gemini because it was really terrible 72% of the times
2024-02-28 09:13:34
Now it speaks spanish
2024-02-28 09:13:52
I asked que es el codec AV1?
2024-02-28 09:15:11
Like for example images of cats in rgb and yrcbr color space, or even videos what is the bpp gains?
_wb_
2024-02-28 09:23:45
I see words but they are put in a sequence that is beyond my ability to understand.
spider-mario
2024-02-28 11:12:02
fab, whenever you write something, could you please check which channel you are about to post it to and whether that is the appropriate one?
2024-02-28 11:12:13
if none matches, <#806898911091753051> would be the best fit
Orum
2024-02-28 11:43:39
e 7 is really the sweet spot for lossless, at least on the images I've tested it on, though I do think streaming_input and streaming_output should be on by default
2024-02-28 11:45:29
and it would be a little more convenient if streaming_input didn't require ppm/pgm input, though I can see the challenges associated with supporting other formats
2024-02-28 11:48:16
if you needed a faster lossless preset I'd go all the way to e 2
_wb_
2024-02-29 07:50:44
streaming_input should be possible with PNG input as long as they're not Adam7 interlaced (but most aren't). But this will only affect cjxl; other applications using libjxl have to make changes on their end to do input streaming...
CrushedAsian255
Traneptora -October being the gcc optimization level of "ctober"
2024-02-29 10:47:33
Optimises for ghosts running the software?
sklwmp
2024-03-01 04:51:03
Has anyone tested if compiling with -O3 is faster than -O2 for libjxl? I understand that sometimes, higher optimization levels are not always faster, that sometimes even -Osize works better.
Traneptora
sklwmp Has anyone tested if compiling with -O3 is faster than -O2 for libjxl? I understand that sometimes, higher optimization levels are not always faster, that sometimes even -Osize works better.
2024-03-01 05:58:41
"higher optimization is not always faster" is something that gentoo people have been saying for years, but generally speaking O3 will be faster than O2
2024-03-01 05:59:46
O3 enables -funroll-loops, which can increase compiled code size in a way that that only actually hurts performance on systems that are very slow at loading dynamic libraries from disk
2024-03-01 06:00:12
it won't particularly matter for libjxl though as most of the performance-critcial code uses highway for simd and thus optimization flags won't affect much
2024-03-01 06:03:30
do keep in mind that people often will argue that "-O3 breaks code" but this is not actually true, -O3 breaks *incorrect* code that relies on undefined behavior
2024-03-01 06:03:47
also the biggest source of breakage is `-fstrict-aliasing` which is actually enabled with -O2
2024-03-01 06:04:06
`-fstrict-aliasing` tells the compiler to assume that code does not break the strict aliasing rule
2024-03-01 06:04:44
the strict aliasing rule being "it's illegal to re-interpret-cast pointers except to/from `unsigned char *`, `char *` and `void *`"
2024-03-01 06:05:40
so, doing something like ```c float f = 5.0; uint32_t x = *(uint32_t *)&f; ``` is UB in C, it's called the "strict-aliasing" rule
2024-03-01 06:05:54
if you actually want to do this, you need to do it with a union
2024-03-01 06:06:26
```c union { float f; uint32_t i; } z; z.f = 5.0; uint32_t x = z.i; ``` this is legal in C. (though not in C++)
CrushedAsian255
2024-03-01 06:34:50
In c++ what should you do?
2024-03-01 06:35:02
Bitcasts?
_wb_
2024-03-01 06:45:12
memcpy works
Traneptora
2024-03-01 06:47:37
memcpy is one way, another way IIRC is to use `reinterpret_cast<uint32_t *>` explicitly
2024-03-01 06:47:55
reason it's illegal in C++ is they're more strict about unions
2024-03-01 06:48:24
in C++, exactly one member of a union is considered "active" at any given point, which is by definition the one that was most recently assigned
2024-03-01 06:48:41
and it's illegal C++ to read from a different member than the one that was assigned more recently
2024-03-01 06:48:43
C permits this, though
veluca
Traneptora memcpy is one way, another way IIRC is to use `reinterpret_cast<uint32_t *>` explicitly
2024-03-01 08:20:10
that's definitely not OK
yoochan
2024-03-01 08:38:56
xkcd made one just for you ! https://xkcd.com/2899/
CrushedAsian255
Traneptora in C++, exactly one member of a union is considered "active" at any given point, which is by definition the one that was most recently assigned
2024-03-01 08:51:27
Whatโ€™s the point of using unions then
yurume
2024-03-01 08:59:58
for memory optimization?
2024-03-01 09:00:02
like, `std::variant`
spider-mario
CrushedAsian255 Whatโ€™s the point of using unions then
2024-03-01 09:10:16
same as tagged unions but without the tag
_wb_
2024-03-01 09:54:49
Here is an example to demonstrate something most of us already know but still remains useful to point out: PSNR is not a good perceptual metric. These four images have the same PSNR of 35.0, but if you ask ssimulacra2 (or butteraugli, for that matter), the webp and avif images are worse.
2024-03-01 09:55:06
ugh it strips the animation
2024-03-01 09:55:32
but I guess the distortions are strong enough to see them without reference to the original
2024-03-01 09:57:06
animated version here: https://res.cloudinary.com/jon/psnr35.png
lonjil
2024-03-01 09:57:12
Yes, those are quite severe
_wb_
2024-03-01 09:59:49
this is lower quality than what I consider useful even for the web, but not outrageously low quality where all metrics break down
2024-03-01 10:00:48
whenever someone shows you PSNR BD rate results showing how great something is, point them to this little reminder that PSNR doesn't mean much ๐Ÿ™‚
yoochan
2024-03-01 10:01:54
avif only believe in MS-SSIM, is this metric better than PSNR ?
_wb_
2024-03-01 10:05:52
slightly, but not much
2024-03-01 10:15:14
This shows the correlation between PSNR and human opinions. Horizontal axis is human opinion, where 30 is low quality, 90 is very high quality. Vertical axis is psnr. The color indicates the number of images, where gray is 1 image, blue is 2, green is 20, orange is 100, red is 500 (so it's a density heatmap). As you can see for PSNR things are all over the place, if the PSNR is 50 it will be a very good image but if it is 35 or 40, the real quality can be pretty much anything.
2024-03-01 10:16:41
the solid black line shows the mean human score for a given psnr score, the dashed line shows p25 - p75 and the dotted line shows p5 - p95
2024-03-01 10:18:22
For MS-SSIM the plot looks like this. Better, but still kind of all over the place.
2024-03-01 10:21:27
For SSIMULACRA2 the plot looks like this. Of course still not perfect correlation (that would look like a perfect diagonal line), but significantly better than PSNR or MS-SSIM.
2024-03-01 10:25:15
The numbers at the top of the plot are the Kendall Rank Correlation Coefficient (KRCC), the Spearman Rank Correlation Coefficient (SRCC), and the Pearson Correlation Coefficient (PCC), which are ways to summarize the overall correlation.
2024-03-01 10:30:37
Of course you'll never get the correlation to a perfect 1 because 1) the ground truth of human opinions is not perfect (there's sampling noise in it, humans are not perfectly consistent, etc) ,and 2) humans can have non-transitive preferences occasionally (where they say things like A > B > C > A) which you can never capture with a numerical metric that constrains the concept of quality to a total order relation.
yoochan
2024-03-01 03:21:25
thank you for the plots, they are very interesting ! (and the explanations) I hope we could convince the guy who did answered me here : https://github.com/webmproject/codec-compare/issues/3 ๐Ÿ˜…
2024-03-01 03:21:53
The illustrations reminds me a blog post a read long time ago... written by you I suppose
Traneptora
CrushedAsian255 Whatโ€™s the point of using unions then
2024-03-01 04:39:15
one reason might be, say how I use them in hydrium
2024-03-01 04:39:42
I store the quantized DCT coeffs in the same variable as the dequant ones
2024-03-01 04:40:03
quantized DCT coeffs are integers and the original are floats
2024-03-01 04:40:24
once I quantize them I don't need the float so I just assign it to the integer
2024-03-01 04:40:43
prevents me from allocating two buffers
monad
2024-03-02 11:01:51
all mistakes attributable to me. jon will be angry about this
CrushedAsian255
2024-03-02 12:04:59
What?
Orum
2024-03-02 01:03:59
0.9.0 <:PepeHands:808829977608323112>
fab
2024-03-02 02:14:27
In the benchmarks ive done ive seen 20,2% reductions on libjxl 0.10.0 with vp9 e8
2024-03-02 02:14:52
With the metric Ssimulacra 1
2024-03-02 03:53:32
16;50pm
2024-03-02 03:53:52
Toggleton sent this
jonnyawsom3
2024-03-02 03:54:42
That isn't a JXL
fab
2024-03-02 03:54:56
Gif?
sklwmp
Traneptora "higher optimization is not always faster" is something that gentoo people have been saying for years, but generally speaking O3 will be faster than O2
2024-03-03 11:50:33
Well, apparently Arch says it too
Nyao-chan
2024-03-03 12:13:47
I did try PGO with Polly and BOLT and it was useless. I assume `-O 3` would be the same
2024-03-03 12:14:25
maybe 1.5% faster, but could be just error
veluca
2024-03-03 12:18:03
I would be extremely surprised if compiler optimizations made significant differences for vardct (beyond -O2 of perhaps -O1)
2024-03-03 12:18:09
for modular mode, perhaps
Nyao-chan
2024-03-03 12:18:50
Oh, I was only testing modular, I should specify
2024-03-03 12:19:28
I think Release instead of None did make in a little faster though
lonjil
2024-03-03 12:20:21
-O3 doesn't break things (the code is already broken) but the measured improvements from using it are usually under the margin of error.
afed
veluca I would be extremely surprised if compiler optimizations made significant differences for vardct (beyond -O2 of perhaps -O1)
2024-03-03 12:22:38
btw, the git version for windows is still much slower for fast lossless modes than compiled with clang
2024-03-03 12:26:41
it wouldn't be that bad if people didn't make benchmarks with these much slower binaries (which is already happens)
veluca
lonjil -O3 doesn't break things (the code is already broken) but the measured improvements from using it are usually under the margin of error.
2024-03-03 12:30:01
It turns on autovec though, which while not so great, can be quite helpful in *some* cases... Of course, most of those programs use intrinsics anyway
afed btw, the git version for windows is still much slower for fast lossless modes than compiled with clang
2024-03-03 12:30:25
You mean it's slower with msvc than with clang?
2024-03-03 12:30:54
Ah, yes, I believe I understand why... I guess we could make release binaries with clang-cl, or figure out how to do dynamic dispatching with msvc
afed
2024-03-03 12:31:57
probably at least what is used for windows binaries in git releases
veluca You mean it's slower with msvc than with clang?
2024-03-03 12:46:40
https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980
veluca
afed https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980
2024-03-03 12:47:32
I see, what did you compile the git version with?
afed
2024-03-03 12:48:26
-O2 x86-64-v2
veluca
2024-03-03 01:05:00
... compiler? ๐Ÿ˜›
afed
2024-03-03 01:06:30
latest clang in msys2, can't remember exactly, but most likely the latest release version
2024-03-03 01:07:36
`Clang 17.0.6`
veluca
2024-03-03 01:23:39
I see
2024-03-03 01:23:46
I wonder what I'd get with clang-cl
2024-03-03 01:24:08
is that something you could try out?
afed
2024-03-03 01:26:49
it would be difficult because msvc is big and there are a lot of extra deps
veluca
2024-03-03 02:55:16
I see, I'll investigate during the week then perhaps
afed
2024-03-04 10:10:20
the new git build with clang is not much different, maybe it's an older clang version or something else, lto? <:Thonk:805904896879493180>
veluca
2024-03-04 10:12:22
can you give me the output of something at `-e 1 -d 0`?
2024-03-04 10:12:35
(I assume you mean the win build from my latest PR)
afed
2024-03-04 10:15:45
git msvc `2480 x 3508, geomean: 71.379 MP/s [57.86, 74.33], 100 reps, 8 threads.` git clang `2480 x 3508, geomean: 71.170 MP/s [62.47, 73.80], 100 reps, 8 threads.`
veluca
2024-03-04 10:17:31
yeah no
2024-03-04 10:17:34
that didn't do it
afed
2024-03-04 10:20:05
are these builds for x86-64-v1? it's strange that mostly e1 is slower, even though it should be more optimized
veluca
2024-03-04 10:20:23
oh, I can tell you why
2024-03-04 10:21:14
https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_fast_lossless.cc#L49
2024-03-04 10:22:01
I suspect clang-cl still "smells" like msvc for that check
afed
2024-03-05 10:06:12
it might be useful to have some extra build at least for windows x64 with march avx2 (like -march=haswell, for just avx I don't think it makes much sense) and avx512 support (disabled for generic builds, as far as I know) for benchmarks on modern systems because for windows it's less likely that people will compile their own binaries something like `jxl-x64-avx2-windows-static.zip`, if it's not too hard for extra maintenance and extra compilation time
fab
2024-03-05 10:37:38
Yt is becoming smart, it's learning how to build a custom encoder based on what I gave on Brave
jonnyawsom3
afed it might be useful to have some extra build at least for windows x64 with march avx2 (like -march=haswell, for just avx I don't think it makes much sense) and avx512 support (disabled for generic builds, as far as I know) for benchmarks on modern systems because for windows it's less likely that people will compile their own binaries something like `jxl-x64-avx2-windows-static.zip`, if it's not too hard for extra maintenance and extra compilation time
2024-03-05 03:28:36
The package names are already slightly confusing for the average user, might just make it worse
afed
2024-03-05 04:05:43
don't think one more binary is much more confusing, maybe a slightly different name, but still, avx2 binaries (or something like x86-64-v2, x86-64-v3) are pretty common if that makes sense for speed avx2 gives some gain, enabling avx512 also gives about 50% more and the binary size will not be as bloated as for generic builds, so it's good for everyone though first we need that clang compilation should at least work properly
veluca
2024-03-05 04:07:21
but libjxl uses dynamic dispatch anyway
afed
2024-03-05 04:08:32
yeah, but still march avx2 gives some gains over generic
veluca
2024-03-05 04:08:43
in modular?
afed
2024-03-05 04:15:57
yeah, some, but still, for e1 mostly (and for slower efforts as well, though I haven't done any comparisons recently, but it was pretty consistently) and it's smaller binary size and the option to use avx512, which is disabled by default (windows users rarely compile binaries and avx512 support is not that uncommon)
Oleksii Matiash
2024-03-05 04:49:06
Just curious, is binary size really an issue? I mean does enabling avx512 increase binary size like by x2?
veluca
afed yeah, some, but still, for e1 mostly (and for slower efforts as well, though I haven't done any comparisons recently, but it was pretty consistently) and it's smaller binary size and the option to use avx512, which is disabled by default (windows users rarely compile binaries and avx512 support is not that uncommon)
2024-03-05 04:54:37
not that uncommon *now* ๐Ÿ˜›
2024-03-05 04:54:42
enabling avx512 might make sense
afed
2024-03-05 04:54:44
no, not much, but also compression for e1 with avx512 is a bit worse, maybe that is also one of the reasons
2024-03-05 04:55:24
but for some separate version I think it's worth it
2024-03-05 04:58:05
<:FeelsReadingMan:808827102278451241>
Oleksii Matiash
veluca not that uncommon *now* ๐Ÿ˜›
2024-03-05 05:19:51
Yes ๐Ÿ™‚
jonnyawsom3
2024-03-05 06:00:41
I know a while ago some extensions stopped showing in cjxl due to upstream changes https://discord.com/channels/794206087879852103/804324493420920833/1210293891664977930
eddie.zato
2024-03-06 05:22:02
How to compile cjxl without avx support? I tried `-march=x86-64-v2 -mtune=x86-64-v2 -mno-avx -mno-avx2`, but it doesn't seem to work as cjxl still says `cjxl v0.10.1 5f67ebc [AVX2,SSE4,SSE2]`
_wb_
2024-03-06 06:25:47
They're a HWY define for it
DZgas ะ–
afed <:FeelsReadingMan:808827102278451241>
2024-03-06 09:30:32
๐Ÿ˜ฌ I don't have AVX
190n
2024-03-06 09:32:10
really, what cpu?
DZgas ะ–
190n really, what cpu?
2024-03-06 10:01:38
AMD Athlon II X4 640
Orum
2024-03-06 10:03:26
holy...
2024-03-06 10:04:03
you want an Ivy Bridge PC?
DZgas ะ–
2024-03-06 10:04:10
https://github.com/ggerganov/whisper.cpp https://github.com/Const-me/Whisper An excellent test for stupid developer -- the x64 AND x86 variant is compiled by AVX-only --- geniuses devs
Orum you want an Ivy Bridge PC?
2024-03-06 10:05:18
What for? my computer seems to be working...
Orum
2024-03-06 10:05:32
for AVX, obviously...
DZgas ะ–
Orum for AVX, obviously...
2024-03-06 10:06:05
Oh yes..... and why do I need AVX? ๐Ÿ™‚
2024-03-06 10:08:49
Well, fans of poke <@548366727663321098> anything for no reason, are you an AVX seller?
Orum
2024-03-06 10:09:14
I can't imagine how painful it must be to not have AVX these days
2024-03-06 10:09:34
or just using 14 year old HW
afed
DZgas ะ– ๐Ÿ˜ฌ I don't have AVX
2024-03-06 10:10:35
it would still be just an extra binary, but not for avx, for avx2 and with enabled avx512, so modern systems will get some gains
2024-03-06 10:12:34
also <:KekDog:805390049033191445> https://www.phoronix.com/news/LLVM-Clang-18.1-Released
DZgas ะ–
Orum I can't imagine how painful it must be to not have AVX these days
2024-03-06 10:12:36
๐Ÿ‘‰ useless thing - go to shop and buy HP laptop with N3050 (no have avx) in 2024
HCrikki
2024-03-06 10:12:42
a private build would imo be the best compromise. upstream needs to move forward
190n
DZgas ะ– ๐Ÿ‘‰ useless thing - go to shop and buy HP laptop with N3050 (no have avx) in 2024
2024-03-06 10:14:11
intel moment <:YEP:808828808127971399>
DZgas ะ–
2024-03-06 10:14:40
just think, the AVX developed in 2008 by Intel itself is not used in their 2015 Braswell architecture -- what is the reason hmm
afed
afed also <:KekDog:805390049033191445> https://www.phoronix.com/news/LLVM-Clang-18.1-Released
2024-03-06 10:15:14
but basically avx10.2 is what will be the mainstream version, and 10.1 is sort of transitional
Orum
2024-03-06 10:15:43
braswell?
DZgas ะ–
Orum braswell?
2024-03-06 10:16:13
Orum
2024-03-06 10:16:53
those things without AVX are all atom/low power CPUs
2024-03-06 10:17:12
they're not meant to be crunching stuff with SIMD ๐Ÿ˜†
2024-03-06 10:18:42
anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill
afed
HCrikki a private build would imo be the best compromise. upstream needs to move forward
2024-03-06 10:34:01
there is no need for private builds, just a normal generic version for any cpus and avx2+ (with enabled avx512) for modern systems, when I tested this gives up to 5% for modular modes compared to generic build and avx512 gives about 50% more for e1 (and for other modes too, but less)
2024-03-06 10:39:21
but when the builds are fixed, because right now it's like this https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980
Traneptora
Orum anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill
2024-03-06 10:49:42
2024 year of the risc-v desktop?
Quackdoc
Traneptora 2024 year of the risc-v desktop?
2024-03-06 10:52:06
maybe, the new upcomming milk-v oasis looks insanely good for the alleged price
Traneptora
2024-03-06 10:52:31
just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 10:52:34
it's all qemu instances
Quackdoc
Traneptora just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 10:54:56
risc-v spec has only formally been a stable spec for about 4 years now meaning any hardware produced before that should be considered as possibly incompatible, in that time we have had 3 major socs released. 2 socs are SBC level socs and one soc "corporate" level.
2024-03-06 10:55:42
the antminer x3 is a good example of corporate use, the sipeed licheepi4a is a good example of a midtier soc, and milk-v mars for lower end
2024-03-06 10:56:16
contrary to it being slow, risc-v adoption has been absurdly fast, probably the fastest adoption of any architecture since i386
Traneptora
2024-03-06 10:56:48
what exactly makes risc-v so much better than x86
Quackdoc
2024-03-06 10:58:02
well it's mostly just arm without a lot of the mistakes. Cheap low power devices that perform decently. it also helps that since its an open spec, you can literally design and get someone to fab you chips at an actually decent price
Traneptora
2024-03-06 11:00:02
I see, doesn't strike me as likely to replace x86 for non-embedded computing then
Orum
Traneptora just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 11:00:08
just want to point out that's still very young for an arch
Traneptora
2024-03-06 11:00:37
sure, but I feel like people are hailing it as the next great thing but I'm not really sure what it really gives a non-emedded user over x86
Orum
2024-03-06 11:00:41
ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
Quackdoc
Traneptora I see, doesn't strike me as likely to replace x86 for non-embedded computing then
2024-03-06 11:00:45
people said that about laptops too, but apple was abke to do it anyways
Traneptora
Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:00:52
that's because there's no reason for that
Quackdoc
Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:01:01
apples has kinda lol
Traneptora
2024-03-06 11:01:01
arm doesn't have any benefits over x86 outside of battery life
2024-03-06 11:01:22
apple has a particularly power-efficient implementation of arm in the apple silicon laptops
2024-03-06 11:01:34
but that's a property of that chip, not a property of arm
Orum
Quackdoc apples has kinda lol
2024-03-06 11:01:35
yeah, "kinda" is right
Traneptora sure, but I feel like people are hailing it as the next great thing but I'm not really sure what it really gives a non-emedded user over x86
2024-03-06 11:02:10
single biggest thing is vector instructions
Traneptora
2024-03-06 11:02:26
x86 has vector instructions
Quackdoc people said that about laptops too, but apple was abke to do it anyways
2024-03-06 11:02:43
you say "able" to do that. but apple has a vertically integrated ecosystem, so they can make any changes they want to exactly that ecosystem
Orum
Traneptora x86 has vector instructions
2024-03-06 11:02:47
no, it doesn't <:CatBlobPolice:805388337862279198>
Quackdoc
Traneptora but that's a property of that chip, not a property of arm
2024-03-06 11:02:53
no one has tried in the first place,
Traneptora
Orum no, it doesn't <:CatBlobPolice:805388337862279198>
2024-03-06 11:02:57
what do you think simd is
Orum
2024-03-06 11:03:03
SIMD != vector
Traneptora
2024-03-06 11:03:03
stuff like sse etc.
2024-03-06 11:03:26
what makes SIMD not vector instructions
Orum
2024-03-06 11:03:33
they both process data in parallel, but how they go about it is *completely* different
Quackdoc
2024-03-06 11:03:42
risc-v does really unique stuff for their vector acceleration
Traneptora
Orum they both process data in parallel, but how they go about it is *completely* different
2024-03-06 11:03:55
so it's an implementation-specific thing?
Orum
2024-03-06 11:05:10
this article has a good overview of the differences: https://webcache.googleusercontent.com/search?q=cache:https://medium.com/swlh/risc-v-vector-instructions-vs-arm-and-x86-simd-8c9b17963a31
Quackdoc
2024-03-06 11:05:19
>webcache
Orum
2024-03-06 11:05:26
because paywalled <:FeelsSadMan:808221433243107338>
Traneptora
2024-03-06 11:05:26
it's medium
Quackdoc
Orum because paywalled <:FeelsSadMan:808221433243107338>
2024-03-06 11:05:43
did they break 12ft.io?
2024-03-06 11:06:57
the answer is yes
Orum
2024-03-06 11:07:15
> In the code examples Patterson and Waterman are using they remark that the SIMD programs require 10 to 20 times more instructions to be executed compared to the RISC-V version using vector instructions. that alone makes vector worth it, but the benefits don't stop there
Traneptora
2024-03-06 11:07:26
I don't see how
2024-03-06 11:07:41
why is the number of instructions that exist an important metric
Orum
2024-03-06 11:08:14
those instructions have to be loaded from memory, and that takes time, valuable cache space, and most importantly: power
2024-03-06 11:08:29
> The max vector length can be queried at runtime, so one does not need to hardcode 64 element long batch sizes. this is the *real* killer of SIMD
Traneptora
2024-03-06 11:09:03
as a desktop user I don't care about power
2024-03-06 11:09:14
why would I care about that more than all existing software continuing to work
Orum
2024-03-06 11:09:23
how much SIMD do we have in x86? MMX, SSE, SSE2, SSSE3, SSE4, AVX, AVX2, AVX512?
Traneptora
2024-03-06 11:09:29
nobody uses MMX
2024-03-06 11:09:30
but yes
2024-03-06 11:09:40
fwiw, all the SSE are all 128-bit
2024-03-06 11:09:45
and all the AVX are 256-bit
2024-03-06 11:09:47
AVX512 is 512-bit
Orum
2024-03-06 11:09:50
every time a new SIMD extension comes out, you need to rewrite and/or recompile code for it
Traneptora
2024-03-06 11:10:00
sure, but the existing binary code *still works*
Orum
2024-03-06 11:10:01
with RISC-V vector you don't
Quackdoc
Traneptora as a desktop user I don't care about power
2024-03-06 11:10:23
it depends on where you live, my cousins for instance live on solar + battery / generator, so they want to shave off as much energy as possible
Orum
2024-03-06 11:10:27
you can use whatever new horsepower they put on your new silicon without being stuck with something written for ancient SIMD
Traneptora
2024-03-06 11:10:30
having to recompile performance-critical code once every few years doesn't seem worse than breaking all x86 code cause risc-v hype
Orum
2024-03-06 11:10:48
it's not just recompile, it's *rewriting*
2024-03-06 11:11:09
you don't magically put in MMX code to a compiler and instantly get AVX512
Traneptora
2024-03-06 11:11:23
sure, but again, why is this worth breaking all software that has come out since the 1980s
Quackdoc
2024-03-06 11:11:26
hand written asm is panic
Traneptora
2024-03-06 11:11:35
you can always use intrinsics instead of hand-written asm
2024-03-06 11:11:36
but yes
Orum
2024-03-06 11:11:37
that's the whole point--none of this *is* breaking in RISC-V
Quackdoc
Traneptora sure, but again, why is this worth breaking all software that has come out since the 1980s
2024-03-06 11:11:38
for some it is, for some it isnt.
Traneptora
Orum that's the whole point--none of this *is* breaking in RISC-V
2024-03-06 11:11:50
can I run x86 software on risc-v hardware?
Quackdoc
2024-03-06 11:11:53
but risc-v allows this forwards compatibility
Traneptora
2024-03-06 11:11:57
if the answer to that is "no" you are breaking all software
Orum
2024-03-06 11:12:13
you can, they're called translation layers or emulators
Traneptora
2024-03-06 11:12:24
I see, so the answer is "no"
Orum
2024-03-06 11:12:30
no, it's yes
Traneptora
2024-03-06 11:12:30
if you have to set up a qemu instance, the answer is no
Quackdoc
2024-03-06 11:12:39
well, box86
Orum
2024-03-06 11:12:46
how do you think Apple runs x86 code without having a x86 license?
Traneptora
2024-03-06 11:12:53
apple has an x86 license
Orum
2024-03-06 11:12:58
no, they don't
Traneptora
2024-03-06 11:13:03
mac computers have been using intel CPUs for years
2024-03-06 11:13:06
I have no idea what you're talking about
Orum
2024-03-06 11:13:17
Intel holds the license, not Apple ๐Ÿ˜†
2024-03-06 11:13:22
they bought chips from Intel
Traneptora
2024-03-06 11:13:29
are you being pedantic for the purpose of being pedantic
Orum
2024-03-06 11:13:41
no, this is *extremely* important in the CPU manufacturing space
Traneptora
2024-03-06 11:13:41
macintosh computers have had intel CPUs in them for years
Quackdoc
2024-03-06 11:13:44
talking about rosetta
Traneptora
2024-03-06 11:13:54
you ask about "how can apple run x86 code" the answer is cause they have intel Cpus in them
Orum
2024-03-06 11:14:10
I'm talking about *modern* ARM Apples, not ancient ones
Traneptora
2024-03-06 11:14:21
x86 macintosh computers are not ancient
2024-03-06 11:14:24
powerpc ones are ancient
Quackdoc
2024-03-06 11:14:26
the m1 devices don't, their emulation on the otherhand is extremely efficient, not 100% granted, but still quite good
Traneptora
2024-03-06 11:14:47
so it often doesn't work
2024-03-06 11:14:49
got it
Orum
2024-03-06 11:14:52
well yes PPC are even older but their x86 stuff is quite old at this point
Quackdoc
Traneptora so it often doesn't work
2024-03-06 11:15:28
I've never had an issue with it myself, granted I havent needed to use it often
2024-03-06 11:15:44
i've not seen anyone do a large scale test with it however
Traneptora
Orum well yes PPC are even older but their x86 stuff is quite old at this point
2024-03-06 11:15:48
if 2023 is ancient I can't wait to hear what you think about 2022
Quackdoc
2024-03-06 11:16:34
wasnt the last intel macbook 2020? not what I would call ancient, but they arent exactly new either
Orum
2024-03-06 11:16:35
that was literally their last x86 laptop ๐Ÿคทโ€โ™‚๏ธ
Traneptora
2024-03-06 11:16:36
the apple silicon version of Mac Pro was released literally less than a year ago
2024-03-06 11:16:49
june 2023
Orum
2024-03-06 11:16:53
but x86 was dead to Apple long before that
Traneptora
2024-03-06 11:17:04
ah yes dead to apple despite being explicitly supported less than 12 months ago
2024-03-06 11:17:05
got it
2024-03-06 11:17:24
it wasn't even *announced* that this was going to happen until june 2020
2024-03-06 11:17:27
this isn't ancient history
Quackdoc
2024-03-06 11:17:39
ah their last macbook was apparently 2021
Orum
2024-03-06 11:17:43
yeah, 2020 is ancient in the computing sphere
Traneptora
2024-03-06 11:17:46
no, it's not
Quackdoc
2024-03-06 11:18:06
but well, either way, the point was their x86 emulation is good, and indeed it is
Traneptora
2024-03-06 11:18:07
we're talking less than four years for a transition to an entirely different architecture across their ecosystem
2024-03-06 11:18:14
that's definitely not ancient history
Orum just want to point out that's still very young for an arch
2024-03-06 11:18:52
so risc-v being ten years old is "very young" but four years is ancient history
2024-03-06 11:18:52
got it
Quackdoc
2024-03-06 11:18:56
I havent seen anyone submit any extensions dedicated to acceleration of things like x86 to the riscv spec, but I don't see why they couldn't be submitted
Orum
2024-03-06 11:19:10
I disagree; the moment they announced they were dropping x86 and moving to ARM, there was little reason to buy any x86 Apple HW
Traneptora so risc-v being ten years old is "very young" but four years is ancient history
2024-03-06 11:19:18
different things entirely
Traneptora
2024-03-06 11:19:28
you said it was ancient "in the computing sphere"
2024-03-06 11:19:36
is instruction sets not in the computing sphere
Orum
2024-03-06 11:19:42
yes, in the consumer product computing sphere
Traneptora
2024-03-06 11:19:54
not really
2024-03-06 11:20:00
people don't replace their computers every 4 years
2024-03-06 11:20:03
people just don't do that
Orum
2024-03-06 11:20:09
sure, but no one is buying 4-year old computers either
Traneptora
2024-03-06 11:20:31
you have no reason to do that because it's not cheaper than 1-year-old hardware
2024-03-06 11:20:43
if it was cheaper, people would totally do that
Orum
2024-03-06 11:20:45
which is why all the reviewers are at AMD's throat for their recent copycat move of Intel, trying to sell old silicon under a new name
afed
Traneptora AVX512 is 512-bit
2024-03-06 11:20:50
even 512 didn't really expand for desktops because small cores and now intel is trying to replace it by AVX10 with 256 splits
Traneptora
2024-03-06 11:21:24
yea, it's true that the more and more you try to extend you get diminishing returns
Quackdoc
2024-03-06 11:21:35
oh, riscv was ratified about 4 3/4ths years ago, my bad :D
Traneptora
2024-03-06 11:21:53
AVX512's gains over the 256-bit is lower than the 256-bit AVX over SSE's 128-bit
2024-03-06 11:21:54
etc.
lonjil
Orum anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill
2024-03-06 11:22:24
the SIMD treadmill was already dead
Orum
2024-03-06 11:22:42
in any case, from everyone's perspective *except* the manufacturer's, having a democratized ISA is a *good*thing
lonjil
Quackdoc well it's mostly just arm without a lot of the mistakes. Cheap low power devices that perform decently. it also helps that since its an open spec, you can literally design and get someone to fab you chips at an actually decent price
2024-03-06 11:22:53
There is already Arm without the mistakes, it's called Aarch64
Quackdoc
lonjil There is already Arm without the mistakes, it's called Aarch64
2024-03-06 11:23:05
[av1_kekw](https://cdn.discordapp.com/emojis/758892021191934033.webp?size=48&quality=lossless&name=av1_kekw)
Orum
2024-03-06 11:23:09
instead of the oligopoly we've been forced to swallow for decades
lonjil
Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:23:13
why wouldn't you count apple?
Orum
2024-03-06 11:23:27
because their products are absurdly poor value
Traneptora
lonjil why wouldn't you count apple?
2024-03-06 11:23:44
apple is vertically integrated which gives them the prerogative to change things and force changes on users that other hardware manufacturers don't have
2024-03-06 11:24:05
for example, intel can't sell a new CPU that microsoft windows doesn't work on
Quackdoc
Orum instead of the oligopoly we've been forced to swallow for decades
2024-03-06 11:24:06
indeed, this is one of the major benefits. it's pretty nice to see how varying the risc-v development is now.
lonjil
Traneptora what do you think simd is
2024-03-06 11:24:31
most of the industry considers "vector processing" to be a kind of SIMD, but the "vector processing" industry tries to sell itself by claiming to be something entirely different from SIMD. It is very silly.
Orum
2024-03-06 11:24:49
because it is totally different
Traneptora
2024-03-06 11:24:54
it's just variable-length simd
2024-03-06 11:24:58
it's not totally different
Orum
2024-03-06 11:25:03
"just" ๐Ÿ˜†
Traneptora
2024-03-06 11:25:07
I mean
Quackdoc
lonjil most of the industry considers "vector processing" to be a kind of SIMD, but the "vector processing" industry tries to sell itself by claiming to be something entirely different from SIMD. It is very silly.
2024-03-06 11:25:08
technically speaking, it's massively different when it comes to working with it
Traneptora
2024-03-06 11:25:08
it's still single-instruction-multiple-data
2024-03-06 11:25:16
the core concept is the same
Quackdoc
2024-03-06 11:25:17
yeah but so are gpus
Traneptora
2024-03-06 11:25:23
gpus are glorified simd, yes
Orum
2024-03-06 11:25:32
GPUs are SIMT, not SIMD
lonjil
Orum with RISC-V vector you don't
2024-03-06 11:25:44
you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
Traneptora apple has an x86 license
2024-03-06 11:26:30
no they don't, buying chips doesn't get you a license
Quackdoc
lonjil you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
2024-03-06 11:26:39
that is the case for new *features* but when it comes to extending vector that work doesnt need to get put in
Traneptora
lonjil no they don't, buying chips doesn't get you a license
2024-03-06 11:26:42
then I didn't understand the point of the question
Orum
lonjil you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
2024-03-06 11:27:01
Difference in width is *massive*, especially from a developer's perspective
2024-03-06 11:28:27
instead of having to write, compile, test, and verify code for 4 different widths and countless different extensions, you only need to write one
lonjil
Orum in any case, from everyone's perspective *except* the manufacturer's, having a democratized ISA is a *good*thing
2024-03-06 11:28:46
yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.
Traneptora
2024-03-06 11:28:48
you still run into issues where you need new extensions to use features like f16c or fma
Orum
2024-03-06 11:29:09
FMA is not 'new' by any means
lonjil yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.
2024-03-06 11:29:21
why is it bad for big chips?
Traneptora
2024-03-06 11:29:21
I mean, it's something that was added after SSE
2024-03-06 11:29:29
it's not simply a width change, when it was released it was a new feature
Quackdoc
lonjil yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.
2024-03-06 11:29:30
what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers
lonjil
Orum because their products are absurdly poor value
2024-03-06 11:29:50
have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.
Traneptora
2024-03-06 11:29:57
variable-width simd gets you the width upgrade automatically but anything like f16c or fma3 that wasn't in the previous version won't automatically get added
190n
Quackdoc what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers
2024-03-06 11:30:12
and the SBCs are still slower than arm or x86 ones
Quackdoc
lonjil have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.
2024-03-06 11:30:13
for laptops yes, but the box thingies? I would disagree on
Traneptora
2024-03-06 11:30:20
do they even sell those?
Quackdoc
190n and the SBCs are still slower than arm or x86 ones
2024-03-06 11:30:28
not really?
Traneptora
2024-03-06 11:30:38
as far as I understand the primary reason you'd purchase an apple laptop is that apple has made their apple silicon chips very power-efficient
Quackdoc
2024-03-06 11:30:39
the milk-v mars is faster then the rpi3b from what I can see
2024-03-06 11:30:54
and the pi4a sits somewhere between the pi4 and pi5 in perf
Traneptora
2024-03-06 11:31:01
if you don't like macOS you can always run asahi linux on them too
Orum
lonjil have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.
2024-03-06 11:31:06
Me personally? No, but others have: https://www.youtube.com/watch?v=u1dxOI_kYG8
190n
Quackdoc the milk-v mars is faster then the rpi3b from what I can see
2024-03-06 11:31:26
but rpi3b is 2 generations old
lonjil
Traneptora for example, intel can't sell a new CPU that microsoft windows doesn't work on
2024-03-06 11:31:44
I don't really see the relevance? Pretty much all software for macOS continued to work, and Microsoft is totally willing to port Windows to new architectures (back in the day they ported Windows to Itanium, and today Windows on Arm is finally getting somewhere.)
Quackdoc
190n but rpi3b is 2 generations old
2024-03-06 11:31:57
the rpi3b is also architecturally very different from the 4/5, and is more power efficent then them by quite the margin
190n
2024-03-06 11:31:59
wow it's faster than an in-order arm cpu designed in 2012
Quackdoc
2024-03-06 11:32:12
there is a significant reason why people still by the rpi3b
Traneptora
lonjil I don't really see the relevance? Pretty much all software for macOS continued to work, and Microsoft is totally willing to port Windows to new architectures (back in the day they ported Windows to Itanium, and today Windows on Arm is finally getting somewhere.)
2024-03-06 11:32:13
no, but it's easier to force a transition along when it's vertically integrated
190n
Quackdoc the rpi3b is also architecturally very different from the 4/5, and is more power efficent then them by quite the margin
2024-03-06 11:32:16
is it really more power efficient or does it just use less power
Quackdoc
2024-03-06 11:32:25
> Raspberry Pi 3 Model B will remain in production until at least January 2028
190n is it really more power efficient or does it just use less power
2024-03-06 11:32:37
power efficient
2024-03-06 11:33:11
it doesn't make sense to compare a product designed to compete with rpi3b to an rpi4, they are just different segments
lonjil
Quackdoc that is the case for new *features* but when it comes to extending vector that work doesnt need to get put in
2024-03-06 11:33:28
but you chose to list out every little update and minor version even with the same width, so I think my critique of your critique is fair.
Traneptora
Orum Me personally? No, but others have: https://www.youtube.com/watch?v=u1dxOI_kYG8
2024-03-06 11:33:33
does this video mention power consumption at all
2024-03-06 11:33:39
I don't really want to spend 8 minutes watching it
Quackdoc
lonjil but you chose to list out every little update and minor version even with the same width, so I think my critique of your critique is fair.
2024-03-06 11:33:49
sure, but for a lot of people, width is what matters
2024-03-06 11:34:00
width *is* the key part here afterall
190n
Quackdoc it doesn't make sense to compare a product designed to compete with rpi3b to an rpi4, they are just different segments
2024-03-06 11:34:24
sure but i would not call a pi 3b competitor a "big chip"
Traneptora
2024-03-06 11:34:26
if width is what you care about, width has been updated like 3 times in 20 years
2024-03-06 11:34:30
which is not that much
Orum
Traneptora does this video mention power consumption at all
2024-03-06 11:34:34
IDK, but I have no doubt apple will have better efficiency as they neither have to deal with x86 hell and they are willing to pay for bleeding-edge processes
Quackdoc
190n sure but i would not call a pi 3b competitor a "big chip"
2024-03-06 11:34:37
that's why I asked what is a "big" chip
lonjil
Traneptora then I didn't understand the point of the question
2024-03-06 11:34:50
if you go back up the convo, the original point was that Apple ensured that x86-64 software works just fine on Arm laptops. Then you replied with stuff about license and x86 chips, which was not relevant (however, I replied before realizing that you had missed that point)
Orum
2024-03-06 11:34:56
though actually it has worse efficiency if you go over the 8GB limit
Quackdoc
2024-03-06 11:35:09
for instance the antminer x3 is a crypto bro machine running iirc 3x SG2042s
2024-03-06 11:35:20
Im not sure if that would be considered "big" or not
Traneptora
lonjil if you go back up the convo, the original point was that Apple ensured that x86-64 software works just fine on Arm laptops. Then you replied with stuff about license and x86 chips, which was not relevant (however, I replied before realizing that you had missed that point)
2024-03-06 11:35:25
whether or not apple has paid intel for a license doesn't seem to matter though if x86 code works on apple silicon via rosetta
2024-03-06 11:35:44
like it works or it doesn't, whether they paid intel for it seems largely irrelevant imo
190n
Quackdoc that's why I asked what is a "big" chip
2024-03-06 11:35:47
i would classify as something you could reasonably put in a smartphone or low-end PC, or very high-end SBC
Quackdoc Im not sure if that would be considered "big" or not
2024-03-06 11:35:53
yeah it would
lonjil
Quackdoc what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers
2024-03-06 11:35:57
you know, desktops and servers
Orum
Traneptora like it works or it doesn't, whether they paid intel for it seems largely irrelevant imo
2024-03-06 11:36:07
the point is there's no reason you can't do the same with RISC-V
Quackdoc
190n i would classify as something you could reasonably put in a smartphone or low-end PC, or very high-end SBC
2024-03-06 11:36:22
in that case you have the lichee pi4a, the lichee pi4a is a bit on the pricy side granted, but it's quite the decent chip when excusing that (early adopter fee and everything)
2024-03-06 11:36:49
you also have the upcomming SG2380 which should be quite the promissing peice of work
Traneptora
Orum the point is there's no reason you can't do the same with RISC-V
2024-03-06 11:37:23
except it doesn't fully work what I forsee is someone is running an x86 binary on risc-v and something doesn't work and they report a bug and the developer goes "recompile it for risc-v" and the user gets angry and refuses and the developer gets angry and nobody wins
2024-03-06 11:37:44
this kind of cycle happens all the time in software dev
Orum
2024-03-06 11:37:52
well if it's OSS there really isn't a reason *not* to compile it for RISC-V then
Traneptora
2024-03-06 11:38:02
sure but this happens
Quackdoc
Quackdoc in that case you have the lichee pi4a, the lichee pi4a is a bit on the pricy side granted, but it's quite the decent chip when excusing that (early adopter fee and everything)
2024-03-06 11:38:39
considering that the lichee pi4a manages to compete with the rockchip devices in perf/watt it's pretty amazing since it was the first design released after the spec was formally ratified IIRC
Orum
2024-03-06 11:38:45
sure, but if it's closed source then the developer either releases RISC-V binaries, or (if it's paid closed source) loses a sale
Traneptora
2024-03-06 11:39:00
well for ubiquitous software they won't lose a sale
2024-03-06 11:39:22
if adobe doesn't release risc-v binaries for acrobat and my PDF doesn't render in acroread because of some bug in the translation layer
2024-03-06 11:39:23
nobody wins
lonjil
Orum why is it bad for big chips?
2024-03-06 11:39:57
one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.
Quackdoc
2024-03-06 11:39:58
while true, IMO it's not really that big of an issue for early adopters who go in expecting that
Orum
2024-03-06 11:39:59
if it's ubiquitous then you can just use something else to read PDFs
lonjil
lonjil one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.
2024-03-06 11:40:30
(on the other hand, something like a microcontroller doesn't have register renaming, so this doesn't really matter at all in that space)
190n
lonjil one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.
2024-03-06 11:41:26
Zba extension adds some instructions for address generation, or are you talking about even more complex stuff?
2024-03-06 11:41:49
you can shift left by (1, 2, 3) and add in one operation
lonjil
Quackdoc for laptops yes, but the box thingies? I would disagree on
2024-03-06 11:41:59
which box thingies? Mac Mini can be decent value depending on your needs, especially if you, say, depend on solar power and need to use as little as possible, as was mentioned previously ๐Ÿ˜„ Though if you mean like the Mac Pro then yeah lmao terrible value.
Traneptora
2024-03-06 11:42:33
oh they sell actual ATX-sized mac desktops?
2024-03-06 11:42:41
oh, that, I don't see the point in that
2024-03-06 11:42:52
the primary reason you'd purchase an apple computer is that it's power-efficient
lonjil
2024-03-06 11:42:52
they even sell rack-mount mac desktops
Traneptora
2024-03-06 11:42:55
wew
190n
2024-03-06 11:43:22
they even sell wheeled mac desktops
Orum
2024-03-06 11:44:38
they even sold boat anchors
lonjil
190n Zba extension adds some instructions for address generation, or are you talking about even more complex stuff?
2024-03-06 11:45:14
these generate addresses, which is nice, but they still require you to stuff that address into a register before using it. I meant having complex addressing in your load and store instructions. On regular consumer and server class chips, this is very cheap to implement, basically free performance for nothing.
190n
2024-03-06 11:46:40
ah, like the `[reg + constant * reg]` you can do in x86?
lonjil
Traneptora whether or not apple has paid intel for a license doesn't seem to matter though if x86 code works on apple silicon via rosetta
2024-03-06 11:47:16
yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs ๐Ÿ˜„
190n
2024-03-06 11:47:41
omg there's a conditional operations extension now
Traneptora
lonjil yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs ๐Ÿ˜„
2024-03-06 11:47:56
>docker plz
lonjil
190n ah, like the `[reg + constant * reg]` you can do in x86?
2024-03-06 11:48:06
yeah. Though I don't recall exactly which addressing modes Alibaba engineers added as an extension.
190n
lonjil yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs ๐Ÿ˜„
2024-03-06 11:48:10
and so you can run x86 binaries in your linux vm on apple silicon
2024-03-06 11:48:17
well i guess that's what docker does anyway...
lonjil
2024-03-06 11:49:18
finally, I've caught up to the conversation
2024-03-06 11:49:39
only took me 30 minutes of writing replies
190n well i guess that's what docker does anyway...
2024-03-06 11:50:13
yeah. Docker on Windows and macOS uses VMs
2024-03-06 11:50:45
On FreeBSD I believe Podman (docker but cooler) can use FreeBSD's Linux compat layer, no VM needed.
Traneptora >docker plz
2024-03-06 11:53:03
how do orchestrate stuff? Kubernetes? Ansible? Shell scripts? Just running stuff inside `tmux` by hand? ๐Ÿ˜„ maybe you're a normal person who doesn't orchastrate anything...
Traneptora
lonjil how do orchestrate stuff? Kubernetes? Ansible? Shell scripts? Just running stuff inside `tmux` by hand? ๐Ÿ˜„ maybe you're a normal person who doesn't orchastrate anything...
2024-03-06 11:54:19
I distribute software in source only <:YEP:808828808127971399>
2024-03-06 11:54:58
I've had bad experiences getting docker to work and actually segment off my system
lonjil
Traneptora the primary reason you'd purchase an apple computer is that it's power-efficient
2024-03-06 11:55:20
actually funny thing, there is one area where apple's silly big computers are money efficient. They have way more video ram than any GPU that isn't 10x more expensive. So if you're doing really vram heavy stuff, they're good value.
Traneptora
2024-03-06 11:55:46
huh. what GPU is inside them?
2024-03-06 11:55:50
I asusme apple doesn't make gpus
lonjil
2024-03-06 11:55:56
Same GPU as on the iPhone
2024-03-06 11:56:15
used to be PowerVR, but Apple forked their architecture and took it in-house some years ago.
Traneptora
2024-03-06 11:56:27
ah so it's actually in-house
Quackdoc
lonjil which box thingies? Mac Mini can be decent value depending on your needs, especially if you, say, depend on solar power and need to use as little as possible, as was mentioned previously ๐Ÿ˜„ Though if you mean like the Mac Pro then yeah lmao terrible value.
2024-03-06 11:56:46
I don't really know, I dont follow the names much anymore
lonjil
Traneptora ah so it's actually in-house
2024-03-06 11:57:32
M3 Max is 92 billion transistors, most of which goes to the GPU. And M3 Ultra (whenever it comes out) will double that.
2024-03-06 11:58:04
for reference, Nvidia 4090 has 76 billion transistors
2024-03-06 11:59:46
M3 Max can have up to 128 GB of RAM (M3 Ultra should double that), while the 4090 has 24.
190n
lonjil yeah. Docker on Windows and macOS uses VMs
2024-03-07 12:00:04
https://macoscontainers.org/ tho if it goes anywhere
lonjil
2024-03-07 12:01:32
neat
Quackdoc I don't really know, I dont follow the names much anymore
2024-03-07 12:04:57
they've been using those names for like 20 years :p
Quackdoc
2024-03-07 12:06:16
its the new stupid names like studio or mini or something
Traneptora
lonjil M3 Max can have up to 128 GB of RAM (M3 Ultra should double that), while the 4090 has 24.
2024-03-07 12:06:40
... why?
lonjil
2024-03-07 12:06:41
mini is since 2005. studio is a new name I think yeah.
Traneptora ... why?
2024-03-07 12:08:53
CPU and GPU is one the same chip, it's all unified RAM like any laptop chip or phone. So, same reason you'd have 128 GB in any computer, except now the GPU can use it too. (and you can share buffers between the CPU and GPU, to avoid copying)
Traneptora
2024-03-07 12:09:03
ah, that makes sense
190n
2024-03-07 12:09:59
would be cool if they'd let you install extra slower ram in the mac pro mostly for the cpu to use, as was rumored for a bit
2024-03-07 12:10:03
like normal ddr5
2024-03-07 12:10:45
nvidia is doing something kinda similar with the grace hopper chip that has stacked HBM for the gpu and a larger pool of LPDDR5 for the cpu
lonjil
2024-03-07 12:11:24
Eventually they'll do a 4x chip, with like 512 GB of LPDDR
190n
2024-03-07 12:12:58
still 3x less than the intel mac pro
lonjil
2024-03-07 12:13:24
psh, unified memory means you don't need as much ๐Ÿ˜‰
2024-03-07 12:14:06
just like how swap means you only need 8 gb of ram, not 16
190n
2024-03-07 12:14:18
hmm how much vram could you get on the cheese grater
2024-03-07 12:16:17
128?
lonjil
2024-03-07 12:16:47
seems like it. With 4 W6800X GPUs bridged together
190n
2024-03-07 12:18:34
apple explaining why it is impossible to port amd/nvidia drivers to arm (amd and nvidia have already done it)
lonjil
lonjil seems like it. With 4 W6800X GPUs bridged together
2024-03-07 12:18:58
much cheaper than any 128GB GPU AMD sells today <:kekw:808717074305122316>
190n
2024-03-07 12:19:15
they realized AI startups have money?
Quackdoc
lonjil psh, unified memory means you don't need as much ๐Ÿ˜‰
2024-03-07 12:20:14
the memory optimization stuff apple does do is pretty neat granted, but its only majorly optimized for their users average use case
lonjil
190n apple explaining why it is impossible to port amd/nvidia drivers to arm (amd and nvidia have already done it)
2024-03-07 12:20:19
it has to do with PCIe features that many Arm systems lack. You have two options. Either add the missing features in hardware, or simply trap memory accesses and emulate them in software (super slow!) that latter option is unfortunately common
Quackdoc the memory optimization stuff apple does do is pretty neat granted, but its only majorly optimized for their users average use case
2024-03-07 12:21:47
you should've seen my girlfriend playing modded Kerbal Space Program on her 8GB macbook air. On the one hand, slow and laggy because it needed to swap. On the other hand, still faster than my 16 GB laptop that was only a few years older ๐Ÿ™ƒ (and I was playing unmodded)
Quackdoc
2024-03-07 12:23:00
yeah, the M1 chips iirc have dedicated memory compression acceleration which is nice, and ofc they always can do DMA based swapping since they are guaranteed to be able to swap over nvme now, so it's like, yeah, you do have a lot of optimization stuff, but still, gibe more
lonjil
2024-03-07 12:24:06
If I have money in the future, I might buy a macbook for that sweet lower power use. But honestly I don't want to until they move to Armv9, because I want to play with SVE2.
Quackdoc
2024-03-07 12:25:59
im just waiting for the risc-v stuff, I don't really need that much perf anymore since I just really do web browsing and testing applications anyways. and the new lichee pad4a if they can price it reasonably would be just a decent price point for me anyways. and since I use linux 100% of time now aside from VMs, it works out well for me
Traneptora
lonjil you should've seen my girlfriend playing modded Kerbal Space Program on her 8GB macbook air. On the one hand, slow and laggy because it needed to swap. On the other hand, still faster than my 16 GB laptop that was only a few years older ๐Ÿ™ƒ (and I was playing unmodded)
2024-03-07 12:26:26
I find swap on desktops pointless
lonjil
Quackdoc im just waiting for the risc-v stuff, I don't really need that much perf anymore since I just really do web browsing and testing applications anyways. and the new lichee pad4a if they can price it reasonably would be just a decent price point for me anyways. and since I use linux 100% of time now aside from VMs, it works out well for me
2024-03-07 12:26:34
unfortunately I need big and powerful computers (for reasons)
Quackdoc
2024-03-07 12:26:35
also want one to start porting waydroid to it if I get the time :D
lonjil unfortunately I need big and powerful computers (for reasons)
2024-03-07 12:26:51
remote desktop I assume is off the table? lol
Traneptora
2024-03-07 12:27:14
the only use case where you want to have swap enabled is when you have things loaded into memory for extended periods of time that aren't used, so the OS can swap them out, and use the plentiful ram for better disk caching
lonjil
2024-03-07 12:27:34
i have 15000 tabs open, most of them are in swap I guess
Traneptora
2024-03-07 12:27:41
well that's your fault
2024-03-07 12:28:04
people who have 15k tabs open and complain that they ran out of ram should either download more ram or not have 15k tabs open <:kek:857018203640561677>
Quackdoc
2024-03-07 12:28:06
I find it nice, since linux is trash and can't handle low ram properly [av1_omegalul](https://cdn.discordapp.com/emojis/885026577618980904.webp?size=48&quality=lossless&name=av1_omegalul) so even on 16gib, just compiling something when im using the desktop can hurt, though lately zram helps a good chunk
2024-03-07 12:28:21
zram helps quite a bit
lonjil
2024-03-07 12:28:53
my window manager/compositor (sway) has 700MiB in swap for some reason
Traneptora
2024-03-07 12:29:01
contrary to popular belief, swap is *not* designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it
lonjil
2024-03-07 12:29:01
only 29MiB in RAM
Quackdoc
lonjil my window manager/compositor (sway) has 700MiB in swap for some reason
2024-03-07 12:29:19
hello fellow sway user :D
Traneptora
2024-03-07 12:29:40
the primary purpose of swap is to give the OS more ram to work with to allow it to more aggressively cache the filesystem stuff, cause fsync is evil
lonjil
Traneptora contrary to popular belief, swap is *not* designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it
2024-03-07 12:30:00
I believe macos does something like, keeping track of which application is in focus, and swapping it back in if it got swapped out when it wasn't in use. As opposed to Linux very naive handling.
Quackdoc
Traneptora contrary to popular belief, swap is *not* designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it
2024-03-07 12:30:21
well regardless of what it's designed for, sadly, linux handles low memory situations abysmally, and swap can often prevent hard crashing
Traneptora
2024-03-07 12:30:43
well no, what happens when you run out of ram and have swap is you start thrashing and your system becomes unresponsible
lonjil
2024-03-07 12:31:01
earlier today linux decided to move literally all my applications to swap in favor of filling my ram with, uh, something
Traneptora
2024-03-07 12:31:04
I've used linux on a desktop for a long time and it has never, ever happened that when I started thrashing the system ever became responsive again
Quackdoc
2024-03-07 12:31:12
at least the system can still call oomkiller.
Traneptora
2024-03-07 12:31:21
except it won't
lonjil
2024-03-07 12:31:26
I'm literally using 26GiB of swap right now and my system is responsive
Traneptora
2024-03-07 12:31:34
yea but lon you're not out of memory
Quackdoc
2024-03-07 12:31:40
it does? and even if it doesn't for some reason you can manually call it using sysrq
Traneptora
2024-03-07 12:31:52
it won't call the OOMkiller if you run out of ram with swap
2024-03-07 12:31:57
it will just start thrashing
lonjil
2024-03-07 12:32:10
fun fact: the oom killer always kills discord first for some reason
Quackdoc
2024-03-07 12:32:30
thats not what I have experienced at all, swap is pretty much a necessity on my laptop and tablets, 2gib and 4gib respectively since linux will just hard crash on them.
2024-03-07 12:32:43
when you enable swap if often will recover just fine
Traneptora
2024-03-07 12:32:52
I'm saying it has never been the case that I have run out of memory with swap enabled and had the system ever become responsive again after it started thrashing
2024-03-07 12:33:32
I've been using linux for quite some time as a daily drive. almost 15 years, and I've never experienced anything other than what I described
Quackdoc
2024-03-07 12:33:34
yeah, can't say I share the same experience, but then again, flash helps a lot in these cases anyways
lonjil
lonjil earlier today linux decided to move literally all my applications to swap in favor of filling my ram with, uh, something
2024-03-07 12:33:37
this caused massive thrashing but my system did become usable again eventually
2024-03-07 12:34:00
swap usage was at 70GiB and my mouse cursor wasn't moving
2024-03-07 12:34:10
eventually it became responsive again
Traneptora
2024-03-07 12:34:15
how long did you have to wait
2024-03-07 12:34:31
it may be the case that it responds eventually but I find that power cycling the system is faster than waiting
lonjil
2024-03-07 12:34:40
a minute or two?
Quackdoc
2024-03-07 12:34:44
on my laptops ill sometimes have to wait like 30s, but thats better then hardcrashing and loosing a bunch of data
Traneptora
2024-03-07 12:35:00
ah, yea, I've never had it recover after only a minute
lonjil
2024-03-07 12:35:13
but I have no idea what caused it, linux's swapping heuristics are jank
Traneptora
2024-03-07 12:35:28
my life has been so much better when I disabled swap*
2024-03-07 12:35:40
*I have a swap partition but I don't swap it on unless I need to hibernate
2024-03-07 12:35:52
since hibernation is done to swap space
lonjil
2024-03-07 12:35:53
I might disable swap after I upgrade to a computer with 256 GiB of RAM.
Quackdoc
2024-03-07 12:36:03
how linux handles memory in general is jank lol
Traneptora
2024-03-07 12:36:10
I have 32 GiB and I don't really run out
Quackdoc
2024-03-07 12:36:19
lately i've been using bustd as my oomkiller and it works pretty well
Traneptora
2024-03-07 12:36:26
I didn't really run out at 16 either unless I was doing some kind of heavy encoding stuffs
CrushedAsian255
2024-03-07 12:36:58
i once got mathematica to use 62GB of swap + 22GB of ram
Quackdoc
2024-03-07 12:37:00
fat lto on rust projects with many sub stuff hurts T.T
Traneptora
CrushedAsian255 i once got mathematica to use 62GB of swap + 22GB of ram
2024-03-07 12:37:10
oo, that's impressive, how
CrushedAsian255
Traneptora oo, that's impressive, how
2024-03-07 12:37:22
`DeBrujinSequence[2,1000000000]`
Traneptora
2024-03-07 12:37:28
<:holyfuck:941472932654362676>
2024-03-07 12:37:30
yea that might do it