* Ensure pubkey coherency when rehydrating a handshake cert
* Include a check during handshakes after cert verification that the noise pubkey matches the cert pubkey.
* firewall can distinguish if the host connecting has an overlapping network, is a VPN peer without an overlapping network, or is a unsafe network
* Cross stack subnet stuff (#1512)
* experiment with not filtering out non-common addresses in hostinfo.networks
* allow handshakes without overlaps
* unsafe network test
* change HostInfo.buildNetworks argument to reference the cert
* try to make certificate addition/removal reloadable in some cases
* very spicy change to respond to handshakes with cert versions we cannot match with a cert that we can indeed match
* even spicier change to rehandshake if we detect our cert is lower-version than our peer, and we have a newer-version cert available
* make tryRehandshake easier to understand
Clean up the messageCounter checks added in #1154. Instead of checking that
messageCounter is still at 2, just initialize it to 2 and only increment for
non-handshake messages. Handshake packets will always be packets 1 and 2.
These new helpers make the code a lot cleaner. I confirmed that the
simple helpers like `atomic.Int64` don't add any extra overhead as they
get inlined by the compiler. `atomic.Pointer` adds an extra method call
as it no longer gets inlined, but we aren't using these on the hot path
so it is probably okay.
This allows you to configure remote allow lists specific to different
subnets of the inside CIDR. Example:
remote_allow_ranges:
10.42.42.0/24:
192.168.0.0/16: true
This would only allow hosts with a VPN IP in the 10.42.42.0/24 range to
have private IPs (and thus don't connect over public IPs).
The PR also refactors AllowList into RemoteAllowList and LocalAllowList to make it clearer which methods are allowed on which allow list.
If we receive a handshake packet for a tunnel that has already been
completed, check to see if the new remote is preferred. If so, update to
the preferred remote and send a test packet to influence the other side
to do the same.
* Add more metrics
This change adds the following counter metrics:
Metrics to track packets dropped at the firewall:
firewall.dropped.local_ip
firewall.dropped.remote_ip
firewall.dropped.no_rule
Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).
handshake_manager.initiated
handshake_manager.timed_out
Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.
hostinfo.cached_packets.dropped
hostinfo.cached_packets.sent
This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.
* separate incoming/outgoing metrics
* remove "allowed" firewall metrics
We don't need this on the hotpath, they aren't worh it.
* don't need pointers here
The change for #401 incorrectly called HostInfo.ForcePromoteBest in
stage2, when we really we want to pick the remote that we received the
response from.
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.
Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.
Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).
The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.
here is the results of `make smoke-docker-race`:
before:
lighthouse1: Found 2 data race(s)
host2: Found 36 data race(s)
host3: Found 17 data race(s)
host4: Found 31 data race(s)
after:
host2: Found 1 data race(s)
host4: Found 1 data race(s)
Fixes: #147Fixes: #226Fixes: #283Fixes: #316