💬 laanwj commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814734979)
i have a hard time believing this will make a large difference, especially with the two memcpys involved
On modern CPUs, ALU operations (especially bitwise ones) are so fast compared to any kind of memory access.
And this isn't some advanced crypto math, it's *one* Xor operation per word with a fixed key.
Could avoid the memcpys if the code takes memory alignment into account, but that makes it even more complex. Not sure the pros/cons work out here.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814734979)
i have a hard time believing this will make a large difference, especially with the two memcpys involved
On modern CPUs, ALU operations (especially bitwise ones) are so fast compared to any kind of memory access.
And this isn't some advanced crypto math, it's *one* Xor operation per word with a fixed key.
Could avoid the memcpys if the code takes memory alignment into account, but that makes it even more complex. Not sure the pros/cons work out here.
💬 laanwj commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814736038)
Please avoid % (especially with a dynamic value) in an inner loop. It's essentially a division operation and those are not cheap.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814736038)
Please avoid % (especially with a dynamic value) in an inner loop. It's essentially a division operation and those are not cheap.
💬 l0rinc commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814739738)
Thanks for the hint, I deliberately removed that optimization (please check the commit messages for details), since these are optimized away.
Also, this is just the leftover part, so for key of length 8 (the version used in most places) this will have 7 iterations at most.
Can you see any difference with any of the benchmarks?
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814739738)
Thanks for the hint, I deliberately removed that optimization (please check the commit messages for details), since these are optimized away.
Also, this is just the leftover part, so for key of length 8 (the version used in most places) this will have 7 iterations at most.
Can you see any difference with any of the benchmarks?
💬 l0rinc commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814741753)
The speedup comes from the vectorized operations, i.e. doing 64 bit xor instead of byte-by-byte xor (memcpy seems to be eliminated on 64 bit architectures successfully), see https://godbolt.org/z/Koscjconz
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814741753)
The speedup comes from the vectorized operations, i.e. doing 64 bit xor instead of byte-by-byte xor (memcpy seems to be eliminated on 64 bit architectures successfully), see https://godbolt.org/z/Koscjconz
💬 fanquake commented on issue "How to compile the GUI on opensuse tumbleweed with cmake?":
(https://github.com/bitcoin-core/gui/issues/842#issuecomment-2434952208)
> -DWITH_QRENCODE=OFF
Is this known to be broken / also didn't work previously?
(https://github.com/bitcoin-core/gui/issues/842#issuecomment-2434952208)
> -DWITH_QRENCODE=OFF
Is this known to be broken / also didn't work previously?
💬 laanwj commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814763442)
It's only up to 7 iterations (assuming the key size is 8), sure, youre're right.
But ok, yea, i'm a bit divided about relying on specific non-trivial things being optimized out, makes the output very dependent on specific compiler decisions (which may be fickle in some cases).
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814763442)
It's only up to 7 iterations (assuming the key size is 8), sure, youre're right.
But ok, yea, i'm a bit divided about relying on specific non-trivial things being optimized out, makes the output very dependent on specific compiler decisions (which may be fickle in some cases).
💬 l0rinc commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814766636)
Often the simplest code gets optimized most, since it's more predictable.
Would you like me to extend the test or benchmark suite or try something else to make sure we're comfortable with the change?
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814766636)
Often the simplest code gets optimized most, since it's more predictable.
Would you like me to extend the test or benchmark suite or try something else to make sure we're comfortable with the change?
💬 laanwj commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814766970)
Right, that's trivial for x86_64. Let's also check for architectures that do require alignment for 64-bit reads and writes, like RISC-V.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814766970)
Right, that's trivial for x86_64. Let's also check for architectures that do require alignment for 64-bit reads and writes, like RISC-V.
💬 laanwj commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814773275)
No, it's fine. It just gives me goosebumps seeing code like this, but if it doesn't affect the benchmark and no one else cares then it doesn't matter.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814773275)
No, it's fine. It just gives me goosebumps seeing code like this, but if it doesn't affect the benchmark and no one else cares then it doesn't matter.
💬 maflcko commented on issue "How to compile the GUI on opensuse tumbleweed with cmake?":
(https://github.com/bitcoin-core/gui/issues/842#issuecomment-2434984915)
I never tried libqrencode with autotools.
(https://github.com/bitcoin-core/gui/issues/842#issuecomment-2434984915)
I never tried libqrencode with autotools.
💬 maflcko commented on issue "macOS 13.7 depends build can't find qt":
(https://github.com/bitcoin/bitcoin/issues/31050#issuecomment-2435026697)
Is `/Volumes/SSD/Dev/` and `/Users/sjors/dev/` the same path? I don't know why, but it seems the two are mixed up. My recommendation would be to put the source code and the depends dir on the same path (one of the two) and try again.
(https://github.com/bitcoin/bitcoin/issues/31050#issuecomment-2435026697)
Is `/Volumes/SSD/Dev/` and `/Users/sjors/dev/` the same path? I don't know why, but it seems the two are mixed up. My recommendation would be to put the source code and the depends dir on the same path (one of the two) and try again.
💬 maflcko commented on pull request "test: use context managers and add file existence checks in feature_fee_estimation.py":
(https://github.com/bitcoin/bitcoin/pull/31030#issuecomment-2435052022)
Are you still working on this?
(https://github.com/bitcoin/bitcoin/pull/31030#issuecomment-2435052022)
Are you still working on this?
💬 laanwj commented on pull request "rest: Support transaction broadcast in REST interface":
(https://github.com/bitcoin/bitcoin/pull/31065#issuecomment-2435060786)
> It allows for a binary response format, which is not possible with the (JSON-)RPC interface. For large queries, not having the JSON de/serialization round-trip can make a very meaningful performance difference.
Exactly. Univalue was designed to be a simple and easy to review JSON library, it wasn't designed for performance. The encoding/decoding overhead can matter.
> What is the use of REST interface that doesn't have enough endpoints?
That is a more existential question. There are
...
(https://github.com/bitcoin/bitcoin/pull/31065#issuecomment-2435060786)
> It allows for a binary response format, which is not possible with the (JSON-)RPC interface. For large queries, not having the JSON de/serialization round-trip can make a very meaningful performance difference.
Exactly. Univalue was designed to be a simple and easy to review JSON library, it wasn't designed for performance. The encoding/decoding overhead can matter.
> What is the use of REST interface that doesn't have enough endpoints?
That is a more existential question. There are
...
💬 laanwj commented on pull request "contrib: skip missing binaries in gen-manpages":
(https://github.com/bitcoin/bitcoin/pull/30986#issuecomment-2435085619)
i'm divided on this. There's a use-case, for sure, but this script is mainly used to generate the manpages for a release. And we've had exactly the opposite issue where conditional behavior resulted in incomplete manual pages.
(Prompting me to open #17506)
So not sure it should be the default. As an option, sure.
(https://github.com/bitcoin/bitcoin/pull/30986#issuecomment-2435085619)
i'm divided on this. There's a use-case, for sure, but this script is mainly used to generate the manpages for a release. And we've had exactly the opposite issue where conditional behavior resulted in incomplete manual pages.
(Prompting me to open #17506)
So not sure it should be the default. As an option, sure.
💬 l0rinc commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814847538)
Added RISC-V compiler to https://godbolt.org/z/n5rMeYeas - where it seems to my untrained eyes that it uses two separate 32 bit xors to emulate the 64 bit operation:
```asm
xor a4,a4,s0
xor a5,a5,t2
srli t6,a4,8
srli t5,a4,16
srli t4,a4,24
srli a7,a5,8
```
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814847538)
Added RISC-V compiler to https://godbolt.org/z/n5rMeYeas - where it seems to my untrained eyes that it uses two separate 32 bit xors to emulate the 64 bit operation:
```asm
xor a4,a4,s0
xor a5,a5,t2
srli t6,a4,8
srli t5,a4,16
srli t4,a4,24
srli a7,a5,8
```
💬 sipa commented on issue "Remove BIP94 from regtest":
(https://github.com/bitcoin/bitcoin/issues/31137#issuecomment-2435106952)
For the first option, the BIP94 timewarp behavior could perhaps be turned into a (versionbits) deployment?
Then mainnet and testnet3 can have it off (by not having a deployment defined), always on on testnet4, and it would be implicitly configurable on regtest using `-vbparams`?
(https://github.com/bitcoin/bitcoin/issues/31137#issuecomment-2435106952)
For the first option, the BIP94 timewarp behavior could perhaps be turned into a (versionbits) deployment?
Then mainnet and testnet3 can have it off (by not having a deployment defined), always on on testnet4, and it would be implicitly configurable on regtest using `-vbparams`?
💬 l0rinc commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814852528)
To get rid of the goosebumps I'm handling the remaining 4 bytes as a single 32 bit xor now, so the final loop (when keys are 8 bytes long, which is mostly the case for us, I think) does 3 iterations at most. So even if it's not optimized away, we should be fine doing 3 divisions by a nice round number like 8.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814852528)
To get rid of the goosebumps I'm handling the remaining 4 bytes as a single 32 bit xor now, so the final loop (when keys are 8 bytes long, which is mostly the case for us, I think) does 3 iterations at most. So even if it's not optimized away, we should be fine doing 3 divisions by a nice round number like 8.
💬 maflcko commented on issue "Remove BIP94 from regtest":
(https://github.com/bitcoin/bitcoin/issues/31137#issuecomment-2435118435)
Anything is probably fine here. A pre-mined testnet4 chain can possibly be used for the test as well, instead.
(https://github.com/bitcoin/bitcoin/issues/31137#issuecomment-2435118435)
Anything is probably fine here. A pre-mined testnet4 chain can possibly be used for the test as well, instead.
💬 laanwj commented on issue "Support transaction broadcast in REST interface":
(https://github.com/bitcoin/bitcoin/issues/31017#issuecomment-2435123562)
We're kind of having the conceptual discussion over in #31065. Probably not a good idea to split it. But in retrospect it should have been done here, and sooner, before someone started working on it.
(https://github.com/bitcoin/bitcoin/issues/31017#issuecomment-2435123562)
We're kind of having the conceptual discussion over in #31065. Probably not a good idea to split it. But in retrospect it should have been done here, and sooner, before someone started working on it.
💬 maflcko commented on pull request "optimization: pack util::Xor into 64 bit chunks instead of doing it byte-by-byte":
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814863916)
> divisions by a nice round number like 8.
I don't think the compiler knows the number here, so can't use it to optimize the code based on it.
(https://github.com/bitcoin/bitcoin/pull/31144#discussion_r1814863916)
> divisions by a nice round number like 8.
I don't think the compiler knows the number here, so can't use it to optimize the code based on it.