👍 brunoerg approved a pull request: "[30.x] Finalise v30.1"
(https://github.com/bitcoin/bitcoin/pull/34092#pullrequestreview-3588777441)
code review ACK 2a21824b1190968ee8ba3120400c00e56be10591
(https://github.com/bitcoin/bitcoin/pull/34092#pullrequestreview-3588777441)
code review ACK 2a21824b1190968ee8ba3120400c00e56be10591
🚀 fanquake merged a pull request: "lint: Remove confusing, redundant, and brittle lint-spelling"
(https://github.com/bitcoin/bitcoin/pull/34053)
(https://github.com/bitcoin/bitcoin/pull/34053)
📝 fanquake converted_to_draft a pull request: "test: p2p: check that peer's announced starting height is remembered"
(https://github.com/bitcoin/bitcoin/pull/33990)
Note that the announced starting height of a peer is neither verified nor used in any other logic, so reporting a bogus value [1] as done in the test doesn't have any consequences on the connection -- it's only used for inspection via the `getpeerinfo` RPC and in some debug messages (see also e.g. https://github.com/bitcoin-dot-org/Bitcoin.org/issues/1387#issuecomment-252934859). Admittedly this is not terribly important, but I'd still think having test coverage for this simple reporting functio
...
(https://github.com/bitcoin/bitcoin/pull/33990)
Note that the announced starting height of a peer is neither verified nor used in any other logic, so reporting a bogus value [1] as done in the test doesn't have any consequences on the connection -- it's only used for inspection via the `getpeerinfo` RPC and in some debug messages (see also e.g. https://github.com/bitcoin-dot-org/Bitcoin.org/issues/1387#issuecomment-252934859). Admittedly this is not terribly important, but I'd still think having test coverage for this simple reporting functio
...
💬 l0rinc commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627921902)
I had the same disappointment when implementing the obfuscation optimization. :)
@maflcko do we have a big-endian nightly that supports vectorized operations? Would it have caught this?
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627921902)
I had the same disappointment when implementing the obfuscation optimization. :)
@maflcko do we have a big-endian nightly that supports vectorized operations? Would it have caught this?
💬 theuni commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627930434)
See updated title description where I touched on this.
I found (with lots of experimentation here) that different compilers use complicated and unpredictable heuristics to decide whether or not to unroll loops. Even if an unroll-able loop is detected, unrolling may be skipped because of the function size (as is the case here because it's huge).
So, I'd like to not take chances on unrolling. That means one of:
- Manual unrolling
- Macro-based (as-in `REPEAT10()`)
- A pragma
- Recursive
...
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627930434)
See updated title description where I touched on this.
I found (with lots of experimentation here) that different compilers use complicated and unpredictable heuristics to decide whether or not to unroll loops. Even if an unroll-able loop is detected, unrolling may be skipped because of the function size (as is the case here because it's huge).
So, I'd like to not take chances on unrolling. That means one of:
- Manual unrolling
- Macro-based (as-in `REPEAT10()`)
- A pragma
- Recursive
...
🤔 marcofleon reviewed a pull request: "fuzz: doc: remove any mention to `address_deserialize_v2`"
(https://github.com/bitcoin/bitcoin/pull/34091#pullrequestreview-3588807616)
ACK caf4843a59a9d2512d69f8fd88a9672112bd80ac
Those are the only three instances.
(https://github.com/bitcoin/bitcoin/pull/34091#pullrequestreview-3588807616)
ACK caf4843a59a9d2512d69f8fd88a9672112bd80ac
Those are the only three instances.
💬 vasild commented on pull request "Broadcast own transactions only via short-lived Tor or I2P connections":
(https://github.com/bitcoin/bitcoin/pull/29415#discussion_r2627936962)
<details>
<summary>[patch] cumulative patch to use optional</summary>
```diff
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 512f99f0c5..4e6d77cfba 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -1086,13 +1086,13 @@ private:
void AddAddressKnown(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
void PushAddress(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
void LogBloc
...
(https://github.com/bitcoin/bitcoin/pull/29415#discussion_r2627936962)
<details>
<summary>[patch] cumulative patch to use optional</summary>
```diff
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
index 512f99f0c5..4e6d77cfba 100644
--- a/src/net_processing.cpp
+++ b/src/net_processing.cpp
@@ -1086,13 +1086,13 @@ private:
void AddAddressKnown(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
void PushAddress(Peer& peer, const CAddress& addr) EXCLUSIVE_LOCKS_REQUIRED(g_msgproc_mutex);
void LogBloc
...
💬 l0rinc commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627942778)
Is there any way we could help that would make you reconsider? I don't mind running the benchmarks on a few platforms, as long as we can have simpler code. The current template magic is not something I would like to see more of - modern C++20 should be able to handle the situations we have here. I don't mind experimenting with this if you don't want to.
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627942778)
Is there any way we could help that would make you reconsider? I don't mind running the benchmarks on a few platforms, as long as we can have simpler code. The current template magic is not something I would like to see more of - modern C++20 should be able to handle the situations we have here. I don't mind experimenting with this if you don't want to.
💬 maflcko commented on pull request "log: Use `__func__` for -logsourcelocations":
(https://github.com/bitcoin/bitcoin/pull/34088#discussion_r2627975576)
> > This seems like it could silently make some places use the wrong function name.
>
> Not with `function_name_short()`, though? I think `SourceLocation::function_name()` behaving identical to `std::source_location::function_name()` is desirable, letting users explicitly choose between the stdlib behaviour or our custom behaviour?
I am pretty sure that reverting a function argument incorrectly to `const std::source_location&` and incorrectly using `funciton_name()` will compile fine. Mayb
...
(https://github.com/bitcoin/bitcoin/pull/34088#discussion_r2627975576)
> > This seems like it could silently make some places use the wrong function name.
>
> Not with `function_name_short()`, though? I think `SourceLocation::function_name()` behaving identical to `std::source_location::function_name()` is desirable, letting users explicitly choose between the stdlib behaviour or our custom behaviour?
I am pretty sure that reverting a function argument incorrectly to `const std::source_location&` and incorrectly using `funciton_name()` will compile fine. Mayb
...
💬 theuni commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980132)
I avoided lambdas as I've historically observed lots of optims being skipped (with clang, at least) when using them. But since you/@ajtowns/@l0rinc have all made the same comment, I'll play around and see if these indeed compile down to nothing as one would hope.
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980132)
I avoided lambdas as I've historically observed lots of optims being skipped (with clang, at least) when using them. But since you/@ajtowns/@l0rinc have all made the same comment, I'll play around and see if these indeed compile down to nothing as one would hope.
💬 sipa commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980586)
@theuni See my `std::make_index_sequence` based I demonstrated above. It's template based, but doesn't need recursion, and doesn't rely on compiler unrolling. It simply expands to an expression `(a[0] = v, a[1] = v, a[2] = v, ...)` e.g.
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980586)
@theuni See my `std::make_index_sequence` based I demonstrated above. It's template based, but doesn't need recursion, and doesn't rely on compiler unrolling. It simply expands to an expression `(a[0] = v, a[1] = v, a[2] = v, ...)` e.g.
💬 sipa commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980657)
@theuni See my `std::make_index_sequence` based I demonstrated above. It's template based, but doesn't need recursion, and doesn't rely on compiler unrolling. It simply expands to an expression `(a[0] = v, a[1] = v, a[2] = v, ...)` e.g.
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627980657)
@theuni See my `std::make_index_sequence` based I demonstrated above. It's template based, but doesn't need recursion, and doesn't rely on compiler unrolling. It simply expands to an expression `(a[0] = v, a[1] = v, a[2] = v, ...)` e.g.
👍 ryanofsky approved a pull request: "qa: Avoid knock-on exception in assert_start_raises_init_error"
(https://github.com/bitcoin/bitcoin/pull/32929#pullrequestreview-3588864587)
Code review ACK 356883f0e48be59bcb154096cef82cbf3f0dd9d8. Thanks for the updates! Just rearranged commits and made minor changes in "missing init errors" test since last review
(https://github.com/bitcoin/bitcoin/pull/32929#pullrequestreview-3588864587)
Code review ACK 356883f0e48be59bcb154096cef82cbf3f0dd9d8. Thanks for the updates! Just rearranged commits and made minor changes in "missing init errors" test since last review
📝 vasild opened a pull request: "netif: fix compilation warning in QueryDefaultGatewayImpl()"
(https://github.com/bitcoin/bitcoin/pull/34093)
```
src/common/netif.cpp:137:51: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'unsigned long' [-Werror,-Wsign-compare]
137 | for (nlmsghdr* hdr = (nlmsghdr*)response; NLMSG_OK(hdr, recv_result); hdr = NLMSG_NEXT(hdr, recv_result)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/netlink/netlink.h:220:31: note: expanded from macro 'NLMSG_OK'
220 | #define NLMSG_OK(_hdr, _len) NL_ITEM_OK(
...
(https://github.com/bitcoin/bitcoin/pull/34093)
```
src/common/netif.cpp:137:51: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'unsigned long' [-Werror,-Wsign-compare]
137 | for (nlmsghdr* hdr = (nlmsghdr*)response; NLMSG_OK(hdr, recv_result); hdr = NLMSG_NEXT(hdr, recv_result)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/netlink/netlink.h:220:31: note: expanded from macro 'NLMSG_OK'
220 | #define NLMSG_OK(_hdr, _len) NL_ITEM_OK(
...
💬 l0rinc commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627999059)
https://www.agner.org/optimize/optimizing_cpp.pdf contains a few useful examples:
> The automatic vectorization works best if the following conditions are satisfied:
> 1. Use a compiler with good support for automatic vectorization, such as Gnu, Clang,
> or Intel.
> 2. Use the latest version of the compiler. The compilers are becoming better and better
> at vectorization.
>
> 3. If the arrays or structures are accessed through pointers or references then tell the
> compiler explicitly t
...
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2627999059)
https://www.agner.org/optimize/optimizing_cpp.pdf contains a few useful examples:
> The automatic vectorization works best if the following conditions are satisfied:
> 1. Use a compiler with good support for automatic vectorization, such as Gnu, Clang,
> or Intel.
> 2. Use the latest version of the compiler. The compilers are becoming better and better
> at vectorization.
>
> 3. If the arrays or structures are accessed through pointers or references then tell the
> compiler explicitly t
...
💬 theuni commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#issuecomment-3666489045)
An additional note about the actual vectorizing algorithm itself...
There's a different (and arguably more obvious) algorithm that's possible when calculating exactly 8 states. Rather than loading each `vec256` with partial info from 2 states, it's possible to use 16 `vec256` where each vector contains 1 element for each of the 8 states. It looks like:
```c++
vec256 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15;
broadcast(x0, 0x61707865);
broadcast(x1, 0x3320646
...
(https://github.com/bitcoin/bitcoin/pull/34083#issuecomment-3666489045)
An additional note about the actual vectorizing algorithm itself...
There's a different (and arguably more obvious) algorithm that's possible when calculating exactly 8 states. Rather than loading each `vec256` with partial info from 2 states, it's possible to use 16 `vec256` where each vector contains 1 element for each of the 8 states. It looks like:
```c++
vec256 x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15;
broadcast(x0, 0x61707865);
broadcast(x1, 0x3320646
...
💬 maflcko commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2628013042)
See https://github.com/bitcoin/bitcoin/pull/33436, but it was slow despite having the gui and the fuzz tests disabled. I am running it as part of nightly, so any issues will be caught before a release, but I am not sure if catching everything in pull requests is possible.
Maybe it is possible to split the task into two: One for a cross-compile, which should be faster than a "native" compile via qemu. And another to run the tests, similar to the windows-cross tests.
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2628013042)
See https://github.com/bitcoin/bitcoin/pull/33436, but it was slow despite having the gui and the fuzz tests disabled. I am running it as part of nightly, so any issues will be caught before a release, but I am not sure if catching everything in pull requests is possible.
Maybe it is possible to split the task into two: One for a cross-compile, which should be faster than a "native" compile via qemu. And another to run the tests, similar to the windows-cross tests.
💬 maflcko commented on pull request "ci: run s390x job":
(https://github.com/bitcoin/bitcoin/pull/33436#issuecomment-3666498761)
Maybe by cross-compilation and only running the tests on s390x, it could be speed up?
(https://github.com/bitcoin/bitcoin/pull/33436#issuecomment-3666498761)
Maybe by cross-compilation and only running the tests on s390x, it could be speed up?
💬 sipa commented on pull request "Add initial vectorized chacha20 implementation for 2-3x speedup":
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2628034222)
It can be done with a helper function too, but that's not as concise:
```c++
template <size_t I, size_t... ITER>
ALWAYS_INLINE void arr_set_vec256_inner(std::array<vec256, I>& arr, const vec256& vec, std::index_sequence<ITER...>)
{
((std::get<ITER>(arr) = vec),...);
}
/** Store a vector in all array elements */
template <size_t I>
ALWAYS_INLINE void arr_set_vec256(std::array<vec256, I>& arr, const vec256& vec)
{
arr_set_vec256_inner(arr, vec, std::make_index_sequence<I>())
...
(https://github.com/bitcoin/bitcoin/pull/34083#discussion_r2628034222)
It can be done with a helper function too, but that's not as concise:
```c++
template <size_t I, size_t... ITER>
ALWAYS_INLINE void arr_set_vec256_inner(std::array<vec256, I>& arr, const vec256& vec, std::index_sequence<ITER...>)
{
((std::get<ITER>(arr) = vec),...);
}
/** Store a vector in all array elements */
template <size_t I>
ALWAYS_INLINE void arr_set_vec256(std::array<vec256, I>& arr, const vec256& vec)
{
arr_set_vec256_inner(arr, vec, std::make_index_sequence<I>())
...
👍 ryanofsky approved a pull request: "miner: drop dummy extraNonce in coinbase scriptSig for templates requested via IPC"
(https://github.com/bitcoin/bitcoin/pull/32420#pullrequestreview-3588945400)
Code review ACK 9832a9aa2bffb5c0e14eb67ce22d9ce900106304. Just rebased since last review and tweaked comment
(https://github.com/bitcoin/bitcoin/pull/32420#pullrequestreview-3588945400)
Code review ACK 9832a9aa2bffb5c0e14eb67ce22d9ce900106304. Just rebased since last review and tweaked comment
💬 l0rinc commented on pull request "ci: run s390x job":
(https://github.com/bitcoin/bitcoin/pull/33436#issuecomment-3666537730)
That sounds like a good idea, that's the most painful part.
Is cross compilation stable cross-endianness? I guess it has to be, has anyone tried it?
(https://github.com/bitcoin/bitcoin/pull/33436#issuecomment-3666537730)
That sounds like a good idea, that's the most painful part.
Is cross compilation stable cross-endianness? I guess it has to be, has anyone tried it?