JPEG XL

2024-02-28 03:20:27	It might be a good idea to change the default effort setting to 6. For lossy, e6 is performing about as well as e7 while it is twice as fast, and for lossless e6 does perform worse than e7 but it still is much better than png/webp in all ways and has a more reasonable speed than e7 (even though e7 has become a lot more reasonable than it was, already)
	HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 03:21:30	there are some commonly used sets, but it's always interesting to see how things are for the images you care about — things can be quite different depending on the kind of image...

Quackdoc

	HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 03:35:05	I don't use datasets since I find they poorly represent real world usecase, I will just crawl the internet and download a wack load of images i can get the source for, and rip a good amount of images from gallery sites

afed

	_wb_ It might be a good idea to change the default effort setting to 6. For lossy, e6 is performing about as well as e7 while it is twice as fast, and for lossless e6 does perform worse than e7 but it still is much better than png/webp in all ways and has a more reasonable speed than e7 (even though e7 has become a lot more reasonable than it was, already)
2024-02-28 03:43:36	for lossy e6 with streaming by default and e7 without or now it's for the same efforts as for lossless? `-e 6` by default is good and bad, bad because a lot of people are comparing default settings without taking speed into account and e7 is still fast enough, with streaming also for lossless
2024-02-28 03:52:43	or now lossless e7 is worse than e6 on most images? because patches are disabled?

_wb_

2024-02-28 04:04:28	no, for lossless, e7 compresses better than e6, but it's a bit slow still (even though it's much faster now than it was).
2024-02-28 04:05:28	for lossy, it's not super clear to me how much better the compression of e7 is compared to e6, but I think the answer is "not much".

MSLP

	Traneptora -October being the gcc optimization level of "ctober"
2024-02-28 04:26:55	doesn't work for me, but it's expected since it's February 🤪

Oleksii Matiash

	HCrikki is everyone running against their own sample images or is there a specific source dataset commonly benched ?
2024-02-28 06:12:33	I use my own photos because of their large size and the fact that it is my main jxl usage

afed

	_wb_ no, for lossless, e7 compresses better than e6, but it's a bit slow still (even though it's much faster now than it was).
2024-02-28 06:29:38	ah, I just read it wrong then maybe, though patches can sometimes significantly improve compression, don't know how well it works for lossy, also maybe the latest optimizations for complex images can perform worse on fast efforts

fab

2024-02-28 09:12:05	Jon How JPEG XL can upscale good in the Rance higher than SPMT19730M colours
2024-02-28 09:12:17	Is fast if i load a video
2024-02-28 09:12:43	Basically what is it the Changelogs?
2024-02-28 09:13:23	I rewrote full gemini because it was really terrible 72% of the times
2024-02-28 09:13:34	Now it speaks spanish
2024-02-28 09:13:52	I asked que es el codec AV1?
2024-02-28 09:15:11	Like for example images of cats in rgb and yrcbr color space, or even videos what is the bpp gains?

_wb_

2024-02-28 09:23:45

I see words but they are put in a sequence that is beyond my ability to understand.

spider-mario

2024-02-28 11:12:02	fab, whenever you write something, could you please check which channel you are about to post it to and whether that is the appropriate one?
2024-02-28 11:12:13	if none matches, <#806898911091753051> would be the best fit

Orum

2024-02-28 11:43:39	e 7 is really the sweet spot for lossless, at least on the images I've tested it on, though I do think streaming_input and streaming_output should be on by default
2024-02-28 11:45:29	and it would be a little more convenient if streaming_input didn't require ppm/pgm input, though I can see the challenges associated with supporting other formats
2024-02-28 11:48:16	if you needed a faster lossless preset I'd go all the way to e 2

_wb_

2024-02-29 07:50:44

streaming_input should be possible with PNG input as long as they're not Adam7 interlaced (but most aren't). But this will only affect cjxl; other applications using libjxl have to make changes on their end to do input streaming...

CrushedAsian255

	Traneptora -October being the gcc optimization level of "ctober"
2024-02-29 10:47:33	Optimises for ghosts running the software?

sklwmp

2024-03-01 04:51:03

Has anyone tested if compiling with -O3 is faster than -O2 for libjxl? I understand that sometimes, higher optimization levels are not always faster, that sometimes even -Osize works better.

Traneptora

	sklwmp Has anyone tested if compiling with -O3 is faster than -O2 for libjxl? I understand that sometimes, higher optimization levels are not always faster, that sometimes even -Osize works better.
2024-03-01 05:58:41	"higher optimization is not always faster" is something that gentoo people have been saying for years, but generally speaking O3 will be faster than O2
2024-03-01 05:59:46	O3 enables -funroll-loops, which can increase compiled code size in a way that that only actually hurts performance on systems that are very slow at loading dynamic libraries from disk
2024-03-01 06:00:12	it won't particularly matter for libjxl though as most of the performance-critcial code uses highway for simd and thus optimization flags won't affect much
2024-03-01 06:03:30	do keep in mind that people often will argue that "-O3 breaks code" but this is not actually true, -O3 breaks incorrect code that relies on undefined behavior
2024-03-01 06:03:47	also the biggest source of breakage is `-fstrict-aliasing` which is actually enabled with -O2
2024-03-01 06:04:06	`-fstrict-aliasing` tells the compiler to assume that code does not break the strict aliasing rule
2024-03-01 06:04:44	the strict aliasing rule being "it's illegal to re-interpret-cast pointers except to/from `unsigned char `, `char ` and `void *`"
2024-03-01 06:05:40	so, doing something like ```c float f = 5.0; uint32_t x = (uint32_t )&f; ``` is UB in C, it's called the "strict-aliasing" rule
2024-03-01 06:05:54	if you actually want to do this, you need to do it with a union
2024-03-01 06:06:26	```c union { float f; uint32_t i; } z; z.f = 5.0; uint32_t x = z.i; ``` this is legal in C. (though not in C++)

CrushedAsian255

2024-03-01 06:34:50	In c++ what should you do?
2024-03-01 06:35:02	Bitcasts?

_wb_

2024-03-01 06:45:12

memcpy works

Traneptora

2024-03-01 06:47:37	memcpy is one way, another way IIRC is to use `reinterpret_cast<uint32_t *>` explicitly
2024-03-01 06:47:55	reason it's illegal in C++ is they're more strict about unions
2024-03-01 06:48:24	in C++, exactly one member of a union is considered "active" at any given point, which is by definition the one that was most recently assigned
2024-03-01 06:48:41	and it's illegal C++ to read from a different member than the one that was assigned more recently
2024-03-01 06:48:43	C permits this, though

veluca

	Traneptora memcpy is one way, another way IIRC is to use `reinterpret_cast<uint32_t *>` explicitly
2024-03-01 08:20:10	that's definitely not OK

yoochan

2024-03-01 08:38:56

xkcd made one just for you ! https://xkcd.com/2899/

CrushedAsian255

	Traneptora in C++, exactly one member of a union is considered "active" at any given point, which is by definition the one that was most recently assigned
2024-03-01 08:51:27	What’s the point of using unions then

yurume

2024-03-01 08:59:58	for memory optimization?
2024-03-01 09:00:02	like, `std::variant`

spider-mario

	CrushedAsian255 What’s the point of using unions then
2024-03-01 09:10:16	same as tagged unions but without the tag

_wb_

2024-03-01 09:54:49	Here is an example to demonstrate something most of us already know but still remains useful to point out: PSNR is not a good perceptual metric. These four images have the same PSNR of 35.0, but if you ask ssimulacra2 (or butteraugli, for that matter), the webp and avif images are worse.
2024-03-01 09:55:06	ugh it strips the animation
2024-03-01 09:55:32	but I guess the distortions are strong enough to see them without reference to the original
2024-03-01 09:57:06	animated version here: https://res.cloudinary.com/jon/psnr35.png

lonjil

2024-03-01 09:57:12

Yes, those are quite severe

_wb_

2024-03-01 09:59:49	this is lower quality than what I consider useful even for the web, but not outrageously low quality where all metrics break down
2024-03-01 10:00:48	whenever someone shows you PSNR BD rate results showing how great something is, point them to this little reminder that PSNR doesn't mean much 🙂

yoochan

2024-03-01 10:01:54

avif only believe in MS-SSIM, is this metric better than PSNR ?

_wb_

2024-03-01 10:05:52	slightly, but not much
2024-03-01 10:15:14	This shows the correlation between PSNR and human opinions. Horizontal axis is human opinion, where 30 is low quality, 90 is very high quality. Vertical axis is psnr. The color indicates the number of images, where gray is 1 image, blue is 2, green is 20, orange is 100, red is 500 (so it's a density heatmap). As you can see for PSNR things are all over the place, if the PSNR is 50 it will be a very good image but if it is 35 or 40, the real quality can be pretty much anything.
2024-03-01 10:16:41	the solid black line shows the mean human score for a given psnr score, the dashed line shows p25 - p75 and the dotted line shows p5 - p95
2024-03-01 10:18:22	For MS-SSIM the plot looks like this. Better, but still kind of all over the place.
2024-03-01 10:21:27	For SSIMULACRA2 the plot looks like this. Of course still not perfect correlation (that would look like a perfect diagonal line), but significantly better than PSNR or MS-SSIM.
2024-03-01 10:25:15	The numbers at the top of the plot are the Kendall Rank Correlation Coefficient (KRCC), the Spearman Rank Correlation Coefficient (SRCC), and the Pearson Correlation Coefficient (PCC), which are ways to summarize the overall correlation.
2024-03-01 10:30:37	Of course you'll never get the correlation to a perfect 1 because 1) the ground truth of human opinions is not perfect (there's sampling noise in it, humans are not perfectly consistent, etc) ,and 2) humans can have non-transitive preferences occasionally (where they say things like A > B > C > A) which you can never capture with a numerical metric that constrains the concept of quality to a total order relation.

yoochan

2024-03-01 03:21:25	thank you for the plots, they are very interesting ! (and the explanations) I hope we could convince the guy who did answered me here : https://github.com/webmproject/codec-compare/issues/3 😅
2024-03-01 03:21:53	The illustrations reminds me a blog post a read long time ago... written by you I suppose

Traneptora

	CrushedAsian255 What’s the point of using unions then
2024-03-01 04:39:15	one reason might be, say how I use them in hydrium
2024-03-01 04:39:42	I store the quantized DCT coeffs in the same variable as the dequant ones
2024-03-01 04:40:03	quantized DCT coeffs are integers and the original are floats
2024-03-01 04:40:24	once I quantize them I don't need the float so I just assign it to the integer
2024-03-01 04:40:43	prevents me from allocating two buffers

monad

2024-03-02 11:01:51

all mistakes attributable to me. jon will be angry about this

CrushedAsian255

2024-03-02 12:04:59

What?

Orum

2024-03-02 01:03:59

0.9.0 <:PepeHands:808829977608323112>

fab

2024-03-02 02:14:27	In the benchmarks ive done ive seen 20,2% reductions on libjxl 0.10.0 with vp9 e8
2024-03-02 02:14:52	With the metric Ssimulacra 1
2024-03-02 03:53:32	16;50pm
2024-03-02 03:53:52	Toggleton sent this

jonnyawsom3

2024-03-02 03:54:42

That isn't a JXL

fab

2024-03-02 03:54:56

Gif?

sklwmp

	Traneptora "higher optimization is not always faster" is something that gentoo people have been saying for years, but generally speaking O3 will be faster than O2
2024-03-03 11:50:33	Well, apparently Arch says it too

Nyao-chan

2024-03-03 12:13:47	I did try PGO with Polly and BOLT and it was useless. I assume `-O 3` would be the same
2024-03-03 12:14:25	maybe 1.5% faster, but could be just error

veluca

2024-03-03 12:18:03	I would be extremely surprised if compiler optimizations made significant differences for vardct (beyond -O2 of perhaps -O1)
2024-03-03 12:18:09	for modular mode, perhaps

Nyao-chan

2024-03-03 12:18:50	Oh, I was only testing modular, I should specify
2024-03-03 12:19:28	I think Release instead of None did make in a little faster though

lonjil

2024-03-03 12:20:21

-O3 doesn't break things (the code is already broken) but the measured improvements from using it are usually under the margin of error.

afed

	veluca I would be extremely surprised if compiler optimizations made significant differences for vardct (beyond -O2 of perhaps -O1)
2024-03-03 12:22:38	btw, the git version for windows is still much slower for fast lossless modes than compiled with clang
2024-03-03 12:26:41	it wouldn't be that bad if people didn't make benchmarks with these much slower binaries (which is already happens)

veluca

	lonjil -O3 doesn't break things (the code is already broken) but the measured improvements from using it are usually under the margin of error.
2024-03-03 12:30:01	It turns on autovec though, which while not so great, can be quite helpful in some cases... Of course, most of those programs use intrinsics anyway
	afed btw, the git version for windows is still much slower for fast lossless modes than compiled with clang
2024-03-03 12:30:25	You mean it's slower with msvc than with clang?
2024-03-03 12:30:54	Ah, yes, I believe I understand why... I guess we could make release binaries with clang-cl, or figure out how to do dynamic dispatching with msvc

afed

2024-03-03 12:31:57	probably at least what is used for windows binaries in git releases
	veluca You mean it's slower with msvc than with clang?
2024-03-03 12:46:40	https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980

veluca

	afed https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980
2024-03-03 12:47:32	I see, what did you compile the git version with?

afed

2024-03-03 12:48:26

-O2 x86-64-v2

veluca

2024-03-03 01:05:00

... compiler? 😛

afed

2024-03-03 01:06:30	latest clang in msys2, can't remember exactly, but most likely the latest release version
2024-03-03 01:07:36	`Clang 17.0.6`

veluca

2024-03-03 01:23:39	I see
2024-03-03 01:23:46	I wonder what I'd get with clang-cl
2024-03-03 01:24:08	is that something you could try out?

afed

2024-03-03 01:26:49

it would be difficult because msvc is big and there are a lot of extra deps

veluca

2024-03-03 02:55:16

I see, I'll investigate during the week then perhaps

afed

2024-03-04 10:10:20

the new git build with clang is not much different, maybe it's an older clang version or something else, lto? <:Thonk:805904896879493180>

veluca

2024-03-04 10:12:22	can you give me the output of something at `-e 1 -d 0`?
2024-03-04 10:12:35	(I assume you mean the win build from my latest PR)

afed

2024-03-04 10:15:45

git msvc `2480 x 3508, geomean: 71.379 MP/s [57.86, 74.33], 100 reps, 8 threads.` git clang `2480 x 3508, geomean: 71.170 MP/s [62.47, 73.80], 100 reps, 8 threads.`

veluca

2024-03-04 10:17:31	yeah no
2024-03-04 10:17:34	that didn't do it

afed

2024-03-04 10:20:05

are these builds for x86-64-v1? it's strange that mostly e1 is slower, even though it should be more optimized

veluca

2024-03-04 10:20:23	oh, I can tell you why
2024-03-04 10:21:14	https://github.com/libjxl/libjxl/blob/main/lib/jxl/enc_fast_lossless.cc#L49
2024-03-04 10:22:01	I suspect clang-cl still "smells" like msvc for that check

afed

2024-03-05 10:06:12

it might be useful to have some extra build at least for windows x64 with march avx2 (like -march=haswell, for just avx I don't think it makes much sense) and avx512 support (disabled for generic builds, as far as I know) for benchmarks on modern systems because for windows it's less likely that people will compile their own binaries something like `jxl-x64-avx2-windows-static.zip`, if it's not too hard for extra maintenance and extra compilation time

fab

2024-03-05 10:37:38

Yt is becoming smart, it's learning how to build a custom encoder based on what I gave on Brave

jonnyawsom3

afed it might be useful to have some extra build at least for windows x64 with march avx2 (like -march=haswell, for just avx I don't think it makes much sense) and avx512 support (disabled for generic builds, as far as I know) for benchmarks on modern systems because for windows it's less likely that people will compile their own binaries something like `jxl-x64-avx2-windows-static.zip`, if it's not too hard for extra maintenance and extra compilation time

2024-03-05 03:28:36

The package names are already slightly confusing for the average user, might just make it worse

afed

2024-03-05 04:05:43

don't think one more binary is much more confusing, maybe a slightly different name, but still, avx2 binaries (or something like x86-64-v2, x86-64-v3) are pretty common if that makes sense for speed avx2 gives some gain, enabling avx512 also gives about 50% more and the binary size will not be as bloated as for generic builds, so it's good for everyone though first we need that clang compilation should at least work properly

veluca

2024-03-05 04:07:21

but libjxl uses dynamic dispatch anyway

afed

2024-03-05 04:08:32

yeah, but still march avx2 gives some gains over generic

veluca

2024-03-05 04:08:43

in modular?

afed

2024-03-05 04:15:57

yeah, some, but still, for e1 mostly (and for slower efforts as well, though I haven't done any comparisons recently, but it was pretty consistently) and it's smaller binary size and the option to use avx512, which is disabled by default (windows users rarely compile binaries and avx512 support is not that uncommon)

Oleksii Matiash

2024-03-05 04:49:06

Just curious, is binary size really an issue? I mean does enabling avx512 increase binary size like by x2?

veluca

	afed yeah, some, but still, for e1 mostly (and for slower efforts as well, though I haven't done any comparisons recently, but it was pretty consistently) and it's smaller binary size and the option to use avx512, which is disabled by default (windows users rarely compile binaries and avx512 support is not that uncommon)
2024-03-05 04:54:37	not that uncommon now 😛
2024-03-05 04:54:42	enabling avx512 might make sense

afed

2024-03-05 04:54:44	no, not much, but also compression for e1 with avx512 is a bit worse, maybe that is also one of the reasons
2024-03-05 04:55:24	but for some separate version I think it's worth it
2024-03-05 04:58:05	<:FeelsReadingMan:808827102278451241>

Oleksii Matiash

	veluca not that uncommon now 😛
2024-03-05 05:19:51	Yes 🙂

jonnyawsom3

2024-03-05 06:00:41

I know a while ago some extensions stopped showing in cjxl due to upstream changes https://discord.com/channels/794206087879852103/804324493420920833/1210293891664977930

eddie.zato

2024-03-06 05:22:02

How to compile cjxl without avx support? I tried `-march=x86-64-v2 -mtune=x86-64-v2 -mno-avx -mno-avx2`, but it doesn't seem to work as cjxl still says `cjxl v0.10.1 5f67ebc [AVX2,SSE4,SSE2]`

_wb_

2024-03-06 06:25:47

They're a HWY define for it

DZgas Ж

	afed <:FeelsReadingMan:808827102278451241>
2024-03-06 09:30:32	😬 I don't have AVX

190n

2024-03-06 09:32:10

really, what cpu?

DZgas Ж

	190n really, what cpu?
2024-03-06 10:01:38	AMD Athlon II X4 640

Orum

2024-03-06 10:03:26	holy...
2024-03-06 10:04:03	you want an Ivy Bridge PC?

DZgas Ж

2024-03-06 10:04:10	https://github.com/ggerganov/whisper.cpp https://github.com/Const-me/Whisper An excellent test for stupid developer -- the x64 AND x86 variant is compiled by AVX-only --- geniuses devs
	Orum you want an Ivy Bridge PC?
2024-03-06 10:05:18	What for? my computer seems to be working...

Orum

2024-03-06 10:05:32

for AVX, obviously...

DZgas Ж

	Orum for AVX, obviously...
2024-03-06 10:06:05	Oh yes..... and why do I need AVX? 🙂
2024-03-06 10:08:49	Well, fans of poke <@548366727663321098> anything for no reason, are you an AVX seller?

Orum

2024-03-06 10:09:14	I can't imagine how painful it must be to not have AVX these days
2024-03-06 10:09:34	or just using 14 year old HW

afed

	DZgas Ж 😬 I don't have AVX
2024-03-06 10:10:35	it would still be just an extra binary, but not for avx, for avx2 and with enabled avx512, so modern systems will get some gains
2024-03-06 10:12:34	also <:KekDog:805390049033191445> https://www.phoronix.com/news/LLVM-Clang-18.1-Released

DZgas Ж

	Orum I can't imagine how painful it must be to not have AVX these days
2024-03-06 10:12:36	👉 useless thing - go to shop and buy HP laptop with N3050 (no have avx) in 2024

HCrikki

2024-03-06 10:12:42

a private build would imo be the best compromise. upstream needs to move forward

190n

	DZgas Ж 👉 useless thing - go to shop and buy HP laptop with N3050 (no have avx) in 2024
2024-03-06 10:14:11	intel moment <:YEP:808828808127971399>

DZgas Ж

2024-03-06 10:14:40

just think, the AVX developed in 2008 by Intel itself is not used in their 2015 Braswell architecture -- what is the reason hmm

afed

	afed also <:KekDog:805390049033191445> https://www.phoronix.com/news/LLVM-Clang-18.1-Released
2024-03-06 10:15:14	but basically avx10.2 is what will be the mainstream version, and 10.1 is sort of transitional

Orum

2024-03-06 10:15:43

braswell?

DZgas Ж

	Orum braswell?
2024-03-06 10:16:13

Orum

2024-03-06 10:16:53	those things without AVX are all atom/low power CPUs
2024-03-06 10:17:12	they're not meant to be crunching stuff with SIMD 😆
2024-03-06 10:18:42	anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill

afed

	HCrikki a private build would imo be the best compromise. upstream needs to move forward
2024-03-06 10:34:01	there is no need for private builds, just a normal generic version for any cpus and avx2+ (with enabled avx512) for modern systems, when I tested this gives up to 5% for modular modes compared to generic build and avx512 gives about 50% more for e1 (and for other modes too, but less)
2024-03-06 10:39:21	but when the builds are fixed, because right now it's like this https://canary.discord.com/channels/794206087879852103/848189884614705192/1213829871189626980

Traneptora

	Orum anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill
2024-03-06 10:49:42	2024 year of the risc-v desktop?

Quackdoc

	Traneptora 2024 year of the risc-v desktop?
2024-03-06 10:52:06	maybe, the new upcomming milk-v oasis looks insanely good for the alleged price

Traneptora

2024-03-06 10:52:31	just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 10:52:34	it's all qemu instances

Quackdoc

	Traneptora just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 10:54:56	risc-v spec has only formally been a stable spec for about 4 years now meaning any hardware produced before that should be considered as possibly incompatible, in that time we have had 3 major socs released. 2 socs are SBC level socs and one soc "corporate" level.
2024-03-06 10:55:42	the antminer x3 is a good example of corporate use, the sipeed licheepi4a is a good example of a midtier soc, and milk-v mars for lower end
2024-03-06 10:56:16	contrary to it being slow, risc-v adoption has been absurdly fast, probably the fastest adoption of any architecture since i386

Traneptora

2024-03-06 10:56:48

what exactly makes risc-v so much better than x86

Quackdoc

2024-03-06 10:58:02

well it's mostly just arm without a lot of the mistakes. Cheap low power devices that perform decently. it also helps that since its an open spec, you can literally design and get someone to fab you chips at an actually decent price

Traneptora

2024-03-06 11:00:02

I see, doesn't strike me as likely to replace x86 for non-embedded computing then

Orum

	Traneptora just want to point out that risc-v is like ten years old and we still don't really have hardware for it
2024-03-06 11:00:08	just want to point out that's still very young for an arch

Traneptora

2024-03-06 11:00:37

sure, but I feel like people are hailing it as the next great thing but I'm not really sure what it really gives a non-emedded user over x86

Orum

2024-03-06 11:00:41

ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)

Quackdoc

	Traneptora I see, doesn't strike me as likely to replace x86 for non-embedded computing then
2024-03-06 11:00:45	people said that about laptops too, but apple was abke to do it anyways

Traneptora

	Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:00:52	that's because there's no reason for that

Quackdoc

	Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:01:01	apples has kinda lol

Traneptora

2024-03-06 11:01:01	arm doesn't have any benefits over x86 outside of battery life
2024-03-06 11:01:22	apple has a particularly power-efficient implementation of arm in the apple silicon laptops
2024-03-06 11:01:34	but that's a property of that chip, not a property of arm

Orum

	Quackdoc apples has kinda lol
2024-03-06 11:01:35	yeah, "kinda" is right
	Traneptora sure, but I feel like people are hailing it as the next great thing but I'm not really sure what it really gives a non-emedded user over x86
2024-03-06 11:02:10	single biggest thing is vector instructions

Traneptora

2024-03-06 11:02:26	x86 has vector instructions
	Quackdoc people said that about laptops too, but apple was abke to do it anyways
2024-03-06 11:02:43	you say "able" to do that. but apple has a vertically integrated ecosystem, so they can make any changes they want to exactly that ecosystem

Orum

	Traneptora x86 has vector instructions
2024-03-06 11:02:47	no, it doesn't <:CatBlobPolice:805388337862279198>

Quackdoc

	Traneptora but that's a property of that chip, not a property of arm
2024-03-06 11:02:53	no one has tried in the first place,

Traneptora

	Orum no, it doesn't <:CatBlobPolice:805388337862279198>
2024-03-06 11:02:57	what do you think simd is

Orum

2024-03-06 11:03:03

SIMD != vector

Traneptora

2024-03-06 11:03:03	stuff like sse etc.
2024-03-06 11:03:26	what makes SIMD not vector instructions

Orum

2024-03-06 11:03:33

they both process data in parallel, but how they go about it is *completely* different

Quackdoc

2024-03-06 11:03:42

risc-v does really unique stuff for their vector acceleration

Traneptora

	Orum they both process data in parallel, but how they go about it is completely different
2024-03-06 11:03:55	so it's an implementation-specific thing?

Orum

2024-03-06 11:05:10

this article has a good overview of the differences: https://webcache.googleusercontent.com/search?q=cache:https://medium.com/swlh/risc-v-vector-instructions-vs-arm-and-x86-simd-8c9b17963a31

Quackdoc

2024-03-06 11:05:19

>webcache

Orum

2024-03-06 11:05:26

because paywalled <:FeelsSadMan:808221433243107338>

Traneptora

2024-03-06 11:05:26

it's medium

Quackdoc

	Orum because paywalled <:FeelsSadMan:808221433243107338>
2024-03-06 11:05:43	did they break 12ft.io?
2024-03-06 11:06:57	the answer is yes

Orum

2024-03-06 11:07:15

> In the code examples Patterson and Waterman are using they remark that the SIMD programs require 10 to 20 times more instructions to be executed compared to the RISC-V version using vector instructions. that alone makes vector worth it, but the benefits don't stop there

Traneptora

2024-03-06 11:07:26	I don't see how
2024-03-06 11:07:41	why is the number of instructions that exist an important metric

Orum

2024-03-06 11:08:14	those instructions have to be loaded from memory, and that takes time, valuable cache space, and most importantly: power
2024-03-06 11:08:29	> The max vector length can be queried at runtime, so one does not need to hardcode 64 element long batch sizes. this is the real killer of SIMD

Traneptora

2024-03-06 11:09:03	as a desktop user I don't care about power
2024-03-06 11:09:14	why would I care about that more than all existing software continuing to work

Orum

2024-03-06 11:09:23

how much SIMD do we have in x86? MMX, SSE, SSE2, SSSE3, SSE4, AVX, AVX2, AVX512?

Traneptora

2024-03-06 11:09:29	nobody uses MMX
2024-03-06 11:09:30	but yes
2024-03-06 11:09:40	fwiw, all the SSE are all 128-bit
2024-03-06 11:09:45	and all the AVX are 256-bit
2024-03-06 11:09:47	AVX512 is 512-bit

Orum

2024-03-06 11:09:50

every time a new SIMD extension comes out, you need to rewrite and/or recompile code for it

Traneptora

2024-03-06 11:10:00

sure, but the existing binary code *still works*

Orum

2024-03-06 11:10:01

with RISC-V vector you don't

Quackdoc

	Traneptora as a desktop user I don't care about power
2024-03-06 11:10:23	it depends on where you live, my cousins for instance live on solar + battery / generator, so they want to shave off as much energy as possible

Orum

2024-03-06 11:10:27

you can use whatever new horsepower they put on your new silicon without being stuck with something written for ancient SIMD

Traneptora

2024-03-06 11:10:30

having to recompile performance-critical code once every few years doesn't seem worse than breaking all x86 code cause risc-v hype

Orum

2024-03-06 11:10:48	it's not just recompile, it's rewriting
2024-03-06 11:11:09	you don't magically put in MMX code to a compiler and instantly get AVX512

Traneptora

2024-03-06 11:11:23

sure, but again, why is this worth breaking all software that has come out since the 1980s

Quackdoc

2024-03-06 11:11:26

hand written asm is panic

Traneptora

2024-03-06 11:11:35	you can always use intrinsics instead of hand-written asm
2024-03-06 11:11:36	but yes

Orum

2024-03-06 11:11:37

that's the whole point--none of this *is* breaking in RISC-V

Quackdoc

	Traneptora sure, but again, why is this worth breaking all software that has come out since the 1980s
2024-03-06 11:11:38	for some it is, for some it isnt.

Traneptora

	Orum that's the whole point--none of this is breaking in RISC-V
2024-03-06 11:11:50	can I run x86 software on risc-v hardware?

Quackdoc

2024-03-06 11:11:53

but risc-v allows this forwards compatibility

Traneptora

2024-03-06 11:11:57

if the answer to that is "no" you are breaking all software

Orum

2024-03-06 11:12:13

you can, they're called translation layers or emulators

Traneptora

2024-03-06 11:12:24

I see, so the answer is "no"

Orum

2024-03-06 11:12:30

no, it's yes

Traneptora

2024-03-06 11:12:30

if you have to set up a qemu instance, the answer is no

Quackdoc

2024-03-06 11:12:39

well, box86

Orum

2024-03-06 11:12:46

how do you think Apple runs x86 code without having a x86 license?

Traneptora

2024-03-06 11:12:53

apple has an x86 license

Orum

2024-03-06 11:12:58

no, they don't

Traneptora

2024-03-06 11:13:03	mac computers have been using intel CPUs for years
2024-03-06 11:13:06	I have no idea what you're talking about

Orum

2024-03-06 11:13:17	Intel holds the license, not Apple 😆
2024-03-06 11:13:22	they bought chips from Intel

Traneptora

2024-03-06 11:13:29

are you being pedantic for the purpose of being pedantic

Orum

2024-03-06 11:13:41

no, this is *extremely* important in the CPU manufacturing space

Traneptora

2024-03-06 11:13:41

macintosh computers have had intel CPUs in them for years

Quackdoc

2024-03-06 11:13:44

talking about rosetta

Traneptora

2024-03-06 11:13:54

you ask about "how can apple run x86 code" the answer is cause they have intel Cpus in them

Orum

2024-03-06 11:14:10

I'm talking about *modern* ARM Apples, not ancient ones

Traneptora

2024-03-06 11:14:21	x86 macintosh computers are not ancient
2024-03-06 11:14:24	powerpc ones are ancient

Quackdoc

2024-03-06 11:14:26

the m1 devices don't, their emulation on the otherhand is extremely efficient, not 100% granted, but still quite good

Traneptora

2024-03-06 11:14:47	so it often doesn't work
2024-03-06 11:14:49	got it

Orum

2024-03-06 11:14:52

well yes PPC are even older but their x86 stuff is quite old at this point

Quackdoc

	Traneptora so it often doesn't work
2024-03-06 11:15:28	I've never had an issue with it myself, granted I havent needed to use it often
2024-03-06 11:15:44	i've not seen anyone do a large scale test with it however

Traneptora

	Orum well yes PPC are even older but their x86 stuff is quite old at this point
2024-03-06 11:15:48	if 2023 is ancient I can't wait to hear what you think about 2022

Quackdoc

2024-03-06 11:16:34

wasnt the last intel macbook 2020? not what I would call ancient, but they arent exactly new either

Orum

2024-03-06 11:16:35

that was literally their last x86 laptop 🤷‍♂️

Traneptora

2024-03-06 11:16:36	the apple silicon version of Mac Pro was released literally less than a year ago
2024-03-06 11:16:49	june 2023

Orum

2024-03-06 11:16:53

but x86 was dead to Apple long before that

Traneptora

2024-03-06 11:17:04	ah yes dead to apple despite being explicitly supported less than 12 months ago
2024-03-06 11:17:05	got it
2024-03-06 11:17:24	it wasn't even announced that this was going to happen until june 2020
2024-03-06 11:17:27	this isn't ancient history

Quackdoc

2024-03-06 11:17:39

ah their last macbook was apparently 2021

Orum

2024-03-06 11:17:43

yeah, 2020 is ancient in the computing sphere

Traneptora

2024-03-06 11:17:46

no, it's not

Quackdoc

2024-03-06 11:18:06

but well, either way, the point was their x86 emulation is good, and indeed it is

Traneptora

2024-03-06 11:18:07	we're talking less than four years for a transition to an entirely different architecture across their ecosystem
2024-03-06 11:18:14	that's definitely not ancient history
	Orum just want to point out that's still very young for an arch
2024-03-06 11:18:52	so risc-v being ten years old is "very young" but four years is ancient history
2024-03-06 11:18:52	got it

Quackdoc

2024-03-06 11:18:56

I havent seen anyone submit any extensions dedicated to acceleration of things like x86 to the riscv spec, but I don't see why they couldn't be submitted

Orum

2024-03-06 11:19:10	I disagree; the moment they announced they were dropping x86 and moving to ARM, there was little reason to buy any x86 Apple HW
	Traneptora so risc-v being ten years old is "very young" but four years is ancient history
2024-03-06 11:19:18	different things entirely

Traneptora

2024-03-06 11:19:28	you said it was ancient "in the computing sphere"
2024-03-06 11:19:36	is instruction sets not in the computing sphere

Orum

2024-03-06 11:19:42

yes, in the consumer product computing sphere

Traneptora

2024-03-06 11:19:54	not really
2024-03-06 11:20:00	people don't replace their computers every 4 years
2024-03-06 11:20:03	people just don't do that

Orum

2024-03-06 11:20:09

sure, but no one is buying 4-year old computers either

Traneptora

2024-03-06 11:20:31	you have no reason to do that because it's not cheaper than 1-year-old hardware
2024-03-06 11:20:43	if it was cheaper, people would totally do that

Orum

2024-03-06 11:20:45

which is why all the reviewers are at AMD's throat for their recent copycat move of Intel, trying to sell old silicon under a new name

afed

	Traneptora AVX512 is 512-bit
2024-03-06 11:20:50	even 512 didn't really expand for desktops because small cores and now intel is trying to replace it by AVX10 with 256 splits

Traneptora

2024-03-06 11:21:24

yea, it's true that the more and more you try to extend you get diminishing returns

Quackdoc

2024-03-06 11:21:35

oh, riscv was ratified about 4 3/4ths years ago, my bad :D

Traneptora

2024-03-06 11:21:53	AVX512's gains over the 256-bit is lower than the 256-bit AVX over SSE's 128-bit
2024-03-06 11:21:54	etc.

lonjil

	Orum anyway hopefully RISC-V will take over and their vector instructions will finally kill the SIMD treadmill
2024-03-06 11:22:24	the SIMD treadmill was already dead

Orum

2024-03-06 11:22:42

in any case, from everyone's perspective *except* the manufacturer's, having a democratized ISA is a *good*thing

lonjil

	Quackdoc well it's mostly just arm without a lot of the mistakes. Cheap low power devices that perform decently. it also helps that since its an open spec, you can literally design and get someone to fab you chips at an actually decent price
2024-03-06 11:22:53	There is already Arm without the mistakes, it's called Aarch64

Quackdoc

	lonjil There is already Arm without the mistakes, it's called Aarch64
2024-03-06 11:23:05	[av1_kekw](https://cdn.discordapp.com/emojis/758892021191934033.webp?size=48&quality=lossless&name=av1_kekw)

Orum

2024-03-06 11:23:09

instead of the oligopoly we've been forced to swallow for decades

lonjil

	Orum ARM dates back to 1985 and we still don't really have desktop ARM machines (except Apple, if you count them)
2024-03-06 11:23:13	why wouldn't you count apple?

Orum

2024-03-06 11:23:27

because their products are absurdly poor value

Traneptora

	lonjil why wouldn't you count apple?
2024-03-06 11:23:44	apple is vertically integrated which gives them the prerogative to change things and force changes on users that other hardware manufacturers don't have
2024-03-06 11:24:05	for example, intel can't sell a new CPU that microsoft windows doesn't work on

Quackdoc

	Orum instead of the oligopoly we've been forced to swallow for decades
2024-03-06 11:24:06	indeed, this is one of the major benefits. it's pretty nice to see how varying the risc-v development is now.

lonjil

	Traneptora what do you think simd is
2024-03-06 11:24:31	most of the industry considers "vector processing" to be a kind of SIMD, but the "vector processing" industry tries to sell itself by claiming to be something entirely different from SIMD. It is very silly.

Orum

2024-03-06 11:24:49

because it is totally different

Traneptora

2024-03-06 11:24:54	it's just variable-length simd
2024-03-06 11:24:58	it's not totally different

Orum

2024-03-06 11:25:03

"just" 😆

Traneptora

2024-03-06 11:25:07

I mean

Quackdoc

	lonjil most of the industry considers "vector processing" to be a kind of SIMD, but the "vector processing" industry tries to sell itself by claiming to be something entirely different from SIMD. It is very silly.
2024-03-06 11:25:08	technically speaking, it's massively different when it comes to working with it

Traneptora

2024-03-06 11:25:08	it's still single-instruction-multiple-data
2024-03-06 11:25:16	the core concept is the same

Quackdoc

2024-03-06 11:25:17

yeah but so are gpus

Traneptora

2024-03-06 11:25:23

gpus are glorified simd, yes

Orum

2024-03-06 11:25:32

GPUs are SIMT, not SIMD

lonjil

	Orum with RISC-V vector you don't
2024-03-06 11:25:44	you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
	Traneptora apple has an x86 license
2024-03-06 11:26:30	no they don't, buying chips doesn't get you a license

Quackdoc

	lonjil you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
2024-03-06 11:26:39	that is the case for new features but when it comes to extending vector that work doesnt need to get put in

Traneptora

	lonjil no they don't, buying chips doesn't get you a license
2024-03-06 11:26:42	then I didn't understand the point of the question

Orum

	lonjil you do if you want new features. Most of all those instruction sets you listed were new features, not a difference in width (which is the only thing "vectors" solves)
2024-03-06 11:27:01	Difference in width is massive, especially from a developer's perspective
2024-03-06 11:28:27	instead of having to write, compile, test, and verify code for 4 different widths and countless different extensions, you only need to write one

lonjil

	Orum in any case, from everyone's perspective except the manufacturer's, having a democratized ISA is a goodthing
2024-03-06 11:28:46	yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.

Traneptora

2024-03-06 11:28:48

you still run into issues where you need new extensions to use features like f16c or fma

Orum

2024-03-06 11:29:09	FMA is not 'new' by any means
	lonjil yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.
2024-03-06 11:29:21	why is it bad for big chips?

Traneptora

2024-03-06 11:29:21	I mean, it's something that was added after SSE
2024-03-06 11:29:29	it's not simply a width change, when it was released it was a new feature

Quackdoc

	lonjil yes. RISC-V is good at two things: 1. small microcontrollers. 2. open spec so people can do whatever they want. not very good for "big" chips tho. Some design flaws. And a few flaws that would be super easy to fix (just need a few new instructions) that they refuse to do because it isn't "RISC" enough. Just dogma.
2024-03-06 11:29:30	what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers

lonjil

	Orum because their products are absurdly poor value
2024-03-06 11:29:50	have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.

Traneptora

2024-03-06 11:29:57

variable-width simd gets you the width upgrade automatically but anything like f16c or fma3 that wasn't in the previous version won't automatically get added

190n

	Quackdoc what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers
2024-03-06 11:30:12	and the SBCs are still slower than arm or x86 ones

Quackdoc

	lonjil have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.
2024-03-06 11:30:13	for laptops yes, but the box thingies? I would disagree on

Traneptora

2024-03-06 11:30:20

do they even sell those?

Quackdoc

	190n and the SBCs are still slower than arm or x86 ones
2024-03-06 11:30:28	not really?

Traneptora

2024-03-06 11:30:38

as far as I understand the primary reason you'd purchase an apple laptop is that apple has made their apple silicon chips very power-efficient

Quackdoc

2024-03-06 11:30:39	the milk-v mars is faster then the rpi3b from what I can see
2024-03-06 11:30:54	and the pi4a sits somewhere between the pi4 and pi5 in perf

Traneptora

2024-03-06 11:31:01

if you don't like macOS you can always run asahi linux on them too

Orum

	lonjil have you compared their laptops to other similar laptops? MacBooks are surprisingly great value, especially if you like low power consumption and long battery life.
2024-03-06 11:31:06	Me personally? No, but others have: https://www.youtube.com/watch?v=u1dxOI_kYG8

190n

	Quackdoc the milk-v mars is faster then the rpi3b from what I can see
2024-03-06 11:31:26	but rpi3b is 2 generations old

lonjil

	Traneptora for example, intel can't sell a new CPU that microsoft windows doesn't work on
2024-03-06 11:31:44	I don't really see the relevance? Pretty much all software for macOS continued to work, and Microsoft is totally willing to port Windows to new architectures (back in the day they ported Windows to Itanium, and today Windows on Arm is finally getting somewhere.)

Quackdoc

	190n but rpi3b is 2 generations old
2024-03-06 11:31:57	the rpi3b is also architecturally very different from the 4/5, and is more power efficent then them by quite the margin

190n

2024-03-06 11:31:59

wow it's faster than an in-order arm cpu designed in 2012

Quackdoc

2024-03-06 11:32:12

there is a significant reason why people still by the rpi3b

Traneptora

	lonjil I don't really see the relevance? Pretty much all software for macOS continued to work, and Microsoft is totally willing to port Windows to new architectures (back in the day they ported Windows to Itanium, and today Windows on Arm is finally getting somewhere.)
2024-03-06 11:32:13	no, but it's easier to force a transition along when it's vertically integrated

190n

	Quackdoc the rpi3b is also architecturally very different from the 4/5, and is more power efficent then them by quite the margin
2024-03-06 11:32:16	is it really more power efficient or does it just use less power

Quackdoc

2024-03-06 11:32:25	> Raspberry Pi 3 Model B will remain in production until at least January 2028
	190n is it really more power efficient or does it just use less power
2024-03-06 11:32:37	power efficient
2024-03-06 11:33:11	it doesn't make sense to compare a product designed to compete with rpi3b to an rpi4, they are just different segments

lonjil

	Quackdoc that is the case for new features but when it comes to extending vector that work doesnt need to get put in
2024-03-06 11:33:28	but you chose to list out every little update and minor version even with the same width, so I think my critique of your critique is fair.

Traneptora

	Orum Me personally? No, but others have: https://www.youtube.com/watch?v=u1dxOI_kYG8
2024-03-06 11:33:33	does this video mention power consumption at all
2024-03-06 11:33:39	I don't really want to spend 8 minutes watching it

Quackdoc

	lonjil but you chose to list out every little update and minor version even with the same width, so I think my critique of your critique is fair.
2024-03-06 11:33:49	sure, but for a lot of people, width is what matters
2024-03-06 11:34:00	width is the key part here afterall

190n

	Quackdoc it doesn't make sense to compare a product designed to compete with rpi3b to an rpi4, they are just different segments
2024-03-06 11:34:24	sure but i would not call a pi 3b competitor a "big chip"

Traneptora

2024-03-06 11:34:26	if width is what you care about, width has been updated like 3 times in 20 years
2024-03-06 11:34:30	which is not that much

Orum

	Traneptora does this video mention power consumption at all
2024-03-06 11:34:34	IDK, but I have no doubt apple will have better efficiency as they neither have to deal with x86 hell and they are willing to pay for bleeding-edge processes

Quackdoc

	190n sure but i would not call a pi 3b competitor a "big chip"
2024-03-06 11:34:37	that's why I asked what is a "big" chip

lonjil

	Traneptora then I didn't understand the point of the question
2024-03-06 11:34:50	if you go back up the convo, the original point was that Apple ensured that x86-64 software works just fine on Arm laptops. Then you replied with stuff about license and x86 chips, which was not relevant (however, I replied before realizing that you had missed that point)

Orum

2024-03-06 11:34:56

though actually it has worse efficiency if you go over the 8GB limit

Quackdoc

2024-03-06 11:35:09	for instance the antminer x3 is a crypto bro machine running iirc 3x SG2042s
2024-03-06 11:35:20	Im not sure if that would be considered "big" or not

Traneptora

	lonjil if you go back up the convo, the original point was that Apple ensured that x86-64 software works just fine on Arm laptops. Then you replied with stuff about license and x86 chips, which was not relevant (however, I replied before realizing that you had missed that point)
2024-03-06 11:35:25	whether or not apple has paid intel for a license doesn't seem to matter though if x86 code works on apple silicon via rosetta
2024-03-06 11:35:44	like it works or it doesn't, whether they paid intel for it seems largely irrelevant imo

190n

	Quackdoc that's why I asked what is a "big" chip
2024-03-06 11:35:47	i would classify as something you could reasonably put in a smartphone or low-end PC, or very high-end SBC
	Quackdoc Im not sure if that would be considered "big" or not
2024-03-06 11:35:53	yeah it would

lonjil

	Quackdoc what are "big" chips? risc-v is being adopted in everything from SBCs to cryptominers
2024-03-06 11:35:57	you know, desktops and servers

Orum

	Traneptora like it works or it doesn't, whether they paid intel for it seems largely irrelevant imo
2024-03-06 11:36:07	the point is there's no reason you can't do the same with RISC-V

Quackdoc

	190n i would classify as something you could reasonably put in a smartphone or low-end PC, or very high-end SBC
2024-03-06 11:36:22	in that case you have the lichee pi4a, the lichee pi4a is a bit on the pricy side granted, but it's quite the decent chip when excusing that (early adopter fee and everything)
2024-03-06 11:36:49	you also have the upcomming SG2380 which should be quite the promissing peice of work

Traneptora

	Orum the point is there's no reason you can't do the same with RISC-V
2024-03-06 11:37:23	except it doesn't fully work what I forsee is someone is running an x86 binary on risc-v and something doesn't work and they report a bug and the developer goes "recompile it for risc-v" and the user gets angry and refuses and the developer gets angry and nobody wins
2024-03-06 11:37:44	this kind of cycle happens all the time in software dev

Orum

2024-03-06 11:37:52

well if it's OSS there really isn't a reason *not* to compile it for RISC-V then

Traneptora

2024-03-06 11:38:02

sure but this happens

Quackdoc

	Quackdoc in that case you have the lichee pi4a, the lichee pi4a is a bit on the pricy side granted, but it's quite the decent chip when excusing that (early adopter fee and everything)
2024-03-06 11:38:39	considering that the lichee pi4a manages to compete with the rockchip devices in perf/watt it's pretty amazing since it was the first design released after the spec was formally ratified IIRC

Orum

2024-03-06 11:38:45

sure, but if it's closed source then the developer either releases RISC-V binaries, or (if it's paid closed source) loses a sale

Traneptora

2024-03-06 11:39:00	well for ubiquitous software they won't lose a sale
2024-03-06 11:39:22	if adobe doesn't release risc-v binaries for acrobat and my PDF doesn't render in acroread because of some bug in the translation layer
2024-03-06 11:39:23	nobody wins

lonjil

Orum why is it bad for big chips?

2024-03-06 11:39:57

one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.

Quackdoc

2024-03-06 11:39:58

while true, IMO it's not really that big of an issue for early adopters who go in expecting that

Orum

2024-03-06 11:39:59

if it's ubiquitous then you can just use something else to read PDFs

lonjil

lonjil one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.

2024-03-06 11:40:30

(on the other hand, something like a microcontroller doesn't have register renaming, so this doesn't really matter at all in that space)

190n

	lonjil one example is the lack of complex addressing modes. needing to do those calculations with intermediate registers increases the pressure on the register rename engine, one of the most power hungry parts of the core that is always running. by having more complex instructions with fewer intermediate registers needed, you can reduce the size of the rename engine by something like 20%. Saves a lot of power and die area inside the core. Alibaba has some cores that implement custom extensions for that and some other stuff. They claim something like 30% better perf due to this. But the RISC-V foundation is against it on ideological grounds.
2024-03-06 11:41:26	Zba extension adds some instructions for address generation, or are you talking about even more complex stuff?
2024-03-06 11:41:49	you can shift left by (1, 2, 3) and add in one operation

lonjil

	Quackdoc for laptops yes, but the box thingies? I would disagree on
2024-03-06 11:41:59	which box thingies? Mac Mini can be decent value depending on your needs, especially if you, say, depend on solar power and need to use as little as possible, as was mentioned previously 😄 Though if you mean like the Mac Pro then yeah lmao terrible value.

Traneptora

2024-03-06 11:42:33	oh they sell actual ATX-sized mac desktops?
2024-03-06 11:42:41	oh, that, I don't see the point in that
2024-03-06 11:42:52	the primary reason you'd purchase an apple computer is that it's power-efficient

lonjil

2024-03-06 11:42:52

they even sell rack-mount mac desktops

Traneptora

2024-03-06 11:42:55

wew

190n

2024-03-06 11:43:22

they even sell wheeled mac desktops

Orum

2024-03-06 11:44:38

they even sold boat anchors

lonjil

	190n Zba extension adds some instructions for address generation, or are you talking about even more complex stuff?
2024-03-06 11:45:14	these generate addresses, which is nice, but they still require you to stuff that address into a register before using it. I meant having complex addressing in your load and store instructions. On regular consumer and server class chips, this is very cheap to implement, basically free performance for nothing.

190n

2024-03-06 11:46:40

ah, like the `[reg + constant * reg]` you can do in x86?

lonjil

	Traneptora whether or not apple has paid intel for a license doesn't seem to matter though if x86 code works on apple silicon via rosetta
2024-03-06 11:47:16	yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs 😄

190n

2024-03-06 11:47:41

omg there's a conditional operations extension now

Traneptora

	lonjil yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs 😄
2024-03-06 11:47:56	>docker plz

lonjil

	190n ah, like the `[reg + constant * reg]` you can do in x86?
2024-03-06 11:48:06	yeah. Though I don't recall exactly which addressing modes Alibaba engineers added as an extension.

190n

	lonjil yes. I am just pedantic. Though Apple probably did not pay anything. Fun fact if you didn't know: Apple made a Linux version of Rosetta, so your x86-64 docker containers can run on apple silicon macs 😄
2024-03-06 11:48:10	and so you can run x86 binaries in your linux vm on apple silicon
2024-03-06 11:48:17	well i guess that's what docker does anyway...

lonjil

2024-03-06 11:49:18	finally, I've caught up to the conversation
2024-03-06 11:49:39	only took me 30 minutes of writing replies
	190n well i guess that's what docker does anyway...
2024-03-06 11:50:13	yeah. Docker on Windows and macOS uses VMs
2024-03-06 11:50:45	On FreeBSD I believe Podman (docker but cooler) can use FreeBSD's Linux compat layer, no VM needed.
	Traneptora >docker plz
2024-03-06 11:53:03	how do orchestrate stuff? Kubernetes? Ansible? Shell scripts? Just running stuff inside `tmux` by hand? 😄 maybe you're a normal person who doesn't orchastrate anything...

Traneptora

	lonjil how do orchestrate stuff? Kubernetes? Ansible? Shell scripts? Just running stuff inside `tmux` by hand? 😄 maybe you're a normal person who doesn't orchastrate anything...
2024-03-06 11:54:19	I distribute software in source only <:YEP:808828808127971399>
2024-03-06 11:54:58	I've had bad experiences getting docker to work and actually segment off my system

lonjil

	Traneptora the primary reason you'd purchase an apple computer is that it's power-efficient
2024-03-06 11:55:20	actually funny thing, there is one area where apple's silly big computers are money efficient. They have way more video ram than any GPU that isn't 10x more expensive. So if you're doing really vram heavy stuff, they're good value.

Traneptora

2024-03-06 11:55:46	huh. what GPU is inside them?
2024-03-06 11:55:50	I asusme apple doesn't make gpus

lonjil

2024-03-06 11:55:56	Same GPU as on the iPhone
2024-03-06 11:56:15	used to be PowerVR, but Apple forked their architecture and took it in-house some years ago.

Traneptora

2024-03-06 11:56:27

ah so it's actually in-house

Quackdoc

	lonjil which box thingies? Mac Mini can be decent value depending on your needs, especially if you, say, depend on solar power and need to use as little as possible, as was mentioned previously 😄 Though if you mean like the Mac Pro then yeah lmao terrible value.
2024-03-06 11:56:46	I don't really know, I dont follow the names much anymore

lonjil

	Traneptora ah so it's actually in-house
2024-03-06 11:57:32	M3 Max is 92 billion transistors, most of which goes to the GPU. And M3 Ultra (whenever it comes out) will double that.
2024-03-06 11:58:04	for reference, Nvidia 4090 has 76 billion transistors
2024-03-06 11:59:46	M3 Max can have up to 128 GB of RAM (M3 Ultra should double that), while the 4090 has 24.

190n

	lonjil yeah. Docker on Windows and macOS uses VMs
2024-03-07 12:00:04	https://macoscontainers.org/ tho if it goes anywhere

lonjil

2024-03-07 12:01:32	neat
	Quackdoc I don't really know, I dont follow the names much anymore
2024-03-07 12:04:57	they've been using those names for like 20 years :p

Quackdoc

2024-03-07 12:06:16

its the new stupid names like studio or mini or something

Traneptora

	lonjil M3 Max can have up to 128 GB of RAM (M3 Ultra should double that), while the 4090 has 24.
2024-03-07 12:06:40	... why?

lonjil

2024-03-07 12:06:41	mini is since 2005. studio is a new name I think yeah.
	Traneptora ... why?
2024-03-07 12:08:53	CPU and GPU is one the same chip, it's all unified RAM like any laptop chip or phone. So, same reason you'd have 128 GB in any computer, except now the GPU can use it too. (and you can share buffers between the CPU and GPU, to avoid copying)

Traneptora

2024-03-07 12:09:03

ah, that makes sense

190n

2024-03-07 12:09:59	would be cool if they'd let you install extra slower ram in the mac pro mostly for the cpu to use, as was rumored for a bit
2024-03-07 12:10:03	like normal ddr5
2024-03-07 12:10:45	nvidia is doing something kinda similar with the grace hopper chip that has stacked HBM for the gpu and a larger pool of LPDDR5 for the cpu

lonjil

2024-03-07 12:11:24

Eventually they'll do a 4x chip, with like 512 GB of LPDDR

190n

2024-03-07 12:12:58

still 3x less than the intel mac pro

lonjil

2024-03-07 12:13:24	psh, unified memory means you don't need as much 😉
2024-03-07 12:14:06	just like how swap means you only need 8 gb of ram, not 16

190n

2024-03-07 12:14:18	hmm how much vram could you get on the cheese grater
2024-03-07 12:16:17	128?

lonjil

2024-03-07 12:16:47

seems like it. With 4 W6800X GPUs bridged together

190n

2024-03-07 12:18:34

apple explaining why it is impossible to port amd/nvidia drivers to arm (amd and nvidia have already done it)

lonjil

	lonjil seems like it. With 4 W6800X GPUs bridged together
2024-03-07 12:18:58	much cheaper than any 128GB GPU AMD sells today <:kekw:808717074305122316>

190n

2024-03-07 12:19:15

they realized AI startups have money?

Quackdoc

	lonjil psh, unified memory means you don't need as much 😉
2024-03-07 12:20:14	the memory optimization stuff apple does do is pretty neat granted, but its only majorly optimized for their users average use case

lonjil

	190n apple explaining why it is impossible to port amd/nvidia drivers to arm (amd and nvidia have already done it)
2024-03-07 12:20:19	it has to do with PCIe features that many Arm systems lack. You have two options. Either add the missing features in hardware, or simply trap memory accesses and emulate them in software (super slow!) that latter option is unfortunately common
	Quackdoc the memory optimization stuff apple does do is pretty neat granted, but its only majorly optimized for their users average use case
2024-03-07 12:21:47	you should've seen my girlfriend playing modded Kerbal Space Program on her 8GB macbook air. On the one hand, slow and laggy because it needed to swap. On the other hand, still faster than my 16 GB laptop that was only a few years older 🙃 (and I was playing unmodded)

Quackdoc

2024-03-07 12:23:00

yeah, the M1 chips iirc have dedicated memory compression acceleration which is nice, and ofc they always can do DMA based swapping since they are guaranteed to be able to swap over nvme now, so it's like, yeah, you do have a lot of optimization stuff, but still, gibe more

lonjil

2024-03-07 12:24:06

If I have money in the future, I might buy a macbook for that sweet lower power use. But honestly I don't want to until they move to Armv9, because I want to play with SVE2.

Quackdoc

2024-03-07 12:25:59

im just waiting for the risc-v stuff, I don't really need that much perf anymore since I just really do web browsing and testing applications anyways. and the new lichee pad4a if they can price it reasonably would be just a decent price point for me anyways. and since I use linux 100% of time now aside from VMs, it works out well for me

Traneptora

	lonjil you should've seen my girlfriend playing modded Kerbal Space Program on her 8GB macbook air. On the one hand, slow and laggy because it needed to swap. On the other hand, still faster than my 16 GB laptop that was only a few years older 🙃 (and I was playing unmodded)
2024-03-07 12:26:26	I find swap on desktops pointless

lonjil

	Quackdoc im just waiting for the risc-v stuff, I don't really need that much perf anymore since I just really do web browsing and testing applications anyways. and the new lichee pad4a if they can price it reasonably would be just a decent price point for me anyways. and since I use linux 100% of time now aside from VMs, it works out well for me
2024-03-07 12:26:34	unfortunately I need big and powerful computers (for reasons)

Quackdoc

2024-03-07 12:26:35	also want one to start porting waydroid to it if I get the time :D
	lonjil unfortunately I need big and powerful computers (for reasons)
2024-03-07 12:26:51	remote desktop I assume is off the table? lol

Traneptora

2024-03-07 12:27:14

the only use case where you want to have swap enabled is when you have things loaded into memory for extended periods of time that aren't used, so the OS can swap them out, and use the plentiful ram for better disk caching

lonjil

2024-03-07 12:27:34

i have 15000 tabs open, most of them are in swap I guess

Traneptora

2024-03-07 12:27:41	well that's your fault
2024-03-07 12:28:04	people who have 15k tabs open and complain that they ran out of ram should either download more ram or not have 15k tabs open <:kek:857018203640561677>

Quackdoc

2024-03-07 12:28:06	I find it nice, since linux is trash and can't handle low ram properly [av1_omegalul](https://cdn.discordapp.com/emojis/885026577618980904.webp?size=48&quality=lossless&name=av1_omegalul) so even on 16gib, just compiling something when im using the desktop can hurt, though lately zram helps a good chunk
2024-03-07 12:28:21	zram helps quite a bit

lonjil

2024-03-07 12:28:53

my window manager/compositor (sway) has 700MiB in swap for some reason

Traneptora

2024-03-07 12:29:01

contrary to popular belief, swap is *not* designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it

lonjil

2024-03-07 12:29:01

only 29MiB in RAM

Quackdoc

	lonjil my window manager/compositor (sway) has 700MiB in swap for some reason
2024-03-07 12:29:19	hello fellow sway user :D

Traneptora

2024-03-07 12:29:40

the primary purpose of swap is to give the OS more ram to work with to allow it to more aggressively cache the filesystem stuff, cause fsync is evil

lonjil

	Traneptora contrary to popular belief, swap is not designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it
2024-03-07 12:30:00	I believe macos does something like, keeping track of which application is in focus, and swapping it back in if it got swapped out when it wasn't in use. As opposed to Linux very naive handling.

Quackdoc

	Traneptora contrary to popular belief, swap is not designed for if you run out of memory. what happens if you have swap enabled and run out of memory is you get into a state of trashing that makes your system unresponsible and inevitably you will power cycle it
2024-03-07 12:30:21	well regardless of what it's designed for, sadly, linux handles low memory situations abysmally, and swap can often prevent hard crashing

Traneptora

2024-03-07 12:30:43

well no, what happens when you run out of ram and have swap is you start thrashing and your system becomes unresponsible

lonjil

2024-03-07 12:31:01

earlier today linux decided to move literally all my applications to swap in favor of filling my ram with, uh, something

Traneptora

2024-03-07 12:31:04

I've used linux on a desktop for a long time and it has never, ever happened that when I started thrashing the system ever became responsive again

Quackdoc

2024-03-07 12:31:12

at least the system can still call oomkiller.

Traneptora

2024-03-07 12:31:21

except it won't

lonjil

2024-03-07 12:31:26

I'm literally using 26GiB of swap right now and my system is responsive

Traneptora

2024-03-07 12:31:34

yea but lon you're not out of memory

Quackdoc

2024-03-07 12:31:40

it does? and even if it doesn't for some reason you can manually call it using sysrq

Traneptora

2024-03-07 12:31:52	it won't call the OOMkiller if you run out of ram with swap
2024-03-07 12:31:57	it will just start thrashing

lonjil

2024-03-07 12:32:10

fun fact: the oom killer always kills discord first for some reason

Quackdoc

2024-03-07 12:32:30	thats not what I have experienced at all, swap is pretty much a necessity on my laptop and tablets, 2gib and 4gib respectively since linux will just hard crash on them.
2024-03-07 12:32:43	when you enable swap if often will recover just fine

Traneptora

2024-03-07 12:32:52	I'm saying it has never been the case that I have run out of memory with swap enabled and had the system ever become responsive again after it started thrashing
2024-03-07 12:33:32	I've been using linux for quite some time as a daily drive. almost 15 years, and I've never experienced anything other than what I described

Quackdoc

2024-03-07 12:33:34

yeah, can't say I share the same experience, but then again, flash helps a lot in these cases anyways

lonjil

	lonjil earlier today linux decided to move literally all my applications to swap in favor of filling my ram with, uh, something
2024-03-07 12:33:37	this caused massive thrashing but my system did become usable again eventually
2024-03-07 12:34:00	swap usage was at 70GiB and my mouse cursor wasn't moving
2024-03-07 12:34:10	eventually it became responsive again

Traneptora

2024-03-07 12:34:15	how long did you have to wait
2024-03-07 12:34:31	it may be the case that it responds eventually but I find that power cycling the system is faster than waiting

lonjil

2024-03-07 12:34:40

a minute or two?

Quackdoc

2024-03-07 12:34:44

on my laptops ill sometimes have to wait like 30s, but thats better then hardcrashing and loosing a bunch of data

Traneptora

2024-03-07 12:35:00

ah, yea, I've never had it recover after only a minute

lonjil

2024-03-07 12:35:13

but I have no idea what caused it, linux's swapping heuristics are jank

Traneptora

2024-03-07 12:35:28	my life has been so much better when I disabled swap*
2024-03-07 12:35:40	*I have a swap partition but I don't swap it on unless I need to hibernate
2024-03-07 12:35:52	since hibernation is done to swap space

lonjil

2024-03-07 12:35:53

I might disable swap after I upgrade to a computer with 256 GiB of RAM.

Quackdoc

2024-03-07 12:36:03

how linux handles memory in general is jank lol

Traneptora

2024-03-07 12:36:10

I have 32 GiB and I don't really run out

Quackdoc

2024-03-07 12:36:19

lately i've been using bustd as my oomkiller and it works pretty well

Traneptora

2024-03-07 12:36:26

I didn't really run out at 16 either unless I was doing some kind of heavy encoding stuffs

CrushedAsian255

2024-03-07 12:36:58

i once got mathematica to use 62GB of swap + 22GB of ram

Quackdoc

2024-03-07 12:37:00

fat lto on rust projects with many sub stuff hurts T.T

Traneptora

	CrushedAsian255 i once got mathematica to use 62GB of swap + 22GB of ram
2024-03-07 12:37:10	oo, that's impressive, how

CrushedAsian255

	Traneptora oo, that's impressive, how
2024-03-07 12:37:22	`DeBrujinSequence[2,1000000000]`

Traneptora

2024-03-07 12:37:28	<:holyfuck:941472932654362676>
2024-03-07 12:37:30	yea that might do it

Info

JPEG XL

General chat

Voice Channels

Archived

benchmarks