Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <jackdoan@rivian.com>
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
* firewall: add option to send REJECT replies
This change allows you to configure the firewall to send REJECT packets
when a packet is denied.
firewall:
# Action to take when a packet is not allowed by the firewall rules.
# Can be one of:
# `drop` (default): silently drop the packet.
# `reject`: send a reject reply.
# - For TCP, this will be a RST "Connection Reset" packet.
# - For other protocols, this will be an ICMP port unreachable packet.
outbound_action: drop
inbound_action: drop
These packets are only sent to established tunnels, and only on the
overlay network (currently IPv4 only).
$ ping -c1 192.168.100.3
PING 192.168.100.3 (192.168.100.3) 56(84) bytes of data.
From 192.168.100.3 icmp_seq=2 Destination Port Unreachable
--- 192.168.100.3 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 31ms
$ nc -nzv 192.168.100.3 22
(UNKNOWN) [192.168.100.3] 22 (?) : Connection refused
This change also modifies the smoke test to capture tcpdump pcaps from
both the inside and outside to inspect what is going on over the wire.
It also now does TCP and UDP packet tests using the Nmap version of
ncat.
* calculate seq and ack the same was as the kernel
The logic a bit confusing, so we copy it straight from how the kernel
does iptables `--reject-with tcp-reset`:
- https://github.com/torvalds/linux/blob/v5.19/net/ipv4/netfilter/nf_reject_ipv4.c#L193-L221
* cleanup
The goal of this work is to send packets between two hosts using more than one
5-tuple. When running on networks like AWS where the underlying network driver
and overlay fabric makes routing, load balancing, and failover decisions based
on the flow hash, this enables more than one flow between pairs of hosts.
Multiport spreads outgoing UDP packets across multiple UDP send ports,
which allows nebula to work around any issues on the underlay network.
Some example issues this could work around:
- UDP rate limits on a per flow basis.
- Partial underlay network failure in which some flows work and some don't
Agreement is done during the handshake to decide if multiport mode will
be used for a given tunnel (one side must have tx_enabled set, the other
side must have rx_enabled set)
NOTE: you cannot use multiport on a host if you are relying on UDP hole
punching to get through a NAT or firewall.
NOTE: Linux only (uses raw sockets to send). Also currently only works
with IPv4 underlay network remotes.
This is implemented by opening a raw socket and sending packets with
a source port that is based on a hash of the overlay source/destiation
port. For ICMP and Nebula metadata packets, we use a random source port.
Example configuration:
multiport:
# This host support sending via multiple UDP ports.
tx_enabled: false
# This host supports receiving packets sent from multiple UDP ports.
rx_enabled: false
# How many UDP ports to use when sending. The lowest source port will be
# listen.port and go up to (but not including) listen.port + tx_ports.
tx_ports: 100
# NOTE: All of your hosts must be running a version of Nebula that supports
# multiport if you want to enable this feature. Older versions of Nebula
# will be confused by these multiport handshakes.
#
# If handshakes are not getting a response, attempt to transmit handshakes
# using random UDP source ports (to get around partial underlay network
# failures).
tx_handshake: false
# How many unresponded handshakes we should send before we attempt to
# send multiport handshakes.
tx_handshake_delay: 2
By default, Nebula replies to packets it has no tunnel for with a `recv_error` packet. This packet helps speed up re-connection
in the case that Nebula on either side did not shut down cleanly. This response can be abused as a way to discover if Nebula is running
on a host though. This option lets you configure if you want to send `recv_error` packets always, never, or only to private network remotes.
valid values: always, never, private
This setting is reloadable with SIGHUP.
* don't set ConnectionState to nil
We might have packets processing in another thread, so we can't safely
just set this to nil. Since we removed it from the hostmaps, the next
packets to process should start the handshake over again.
I believe this comment is outdated or incorrect, since the next
handshake will start over with a new HostInfo, I don't think there is
any way a counter reuse could happen:
> We must null the connectionstate or a counter reuse may happen
Here is a panic we saw that I think is related:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x93a037]
goroutine 59 [running, locked to thread]:
github.com/slackhq/nebula.(*Firewall).Drop(...)
github.com/slackhq/nebula/firewall.go:380
github.com/slackhq/nebula.(*Interface).consumeInsidePacket(...)
github.com/slackhq/nebula/inside.go:59
github.com/slackhq/nebula.(*Interface).listenIn(...)
github.com/slackhq/nebula/interface.go:233
created by github.com/slackhq/nebula.(*Interface).run
github.com/slackhq/nebula/interface.go:191
* use closeTunnel
This allows you to configure remote allow lists specific to different
subnets of the inside CIDR. Example:
remote_allow_ranges:
10.42.42.0/24:
192.168.0.0/16: true
This would only allow hosts with a VPN IP in the 10.42.42.0/24 range to
have private IPs (and thus don't connect over public IPs).
The PR also refactors AllowList into RemoteAllowList and LocalAllowList to make it clearer which methods are allowed on which allow list.
Hi @nbrownus
Fixed a small bug that was introduced in
df7c7ee#diff-5d05d02296a1953fd5fbcb3f4ab486bc5f7c34b14c3bdedb068008ec8ff5beb4
having problems due to it
There are some subtle race conditions with the previous handshake_ix implementation, mostly around collisions with localIndexId. This change refactors it so that we have a "commit" phase during the handshake where we grab the lock for the hostmap and ensure that we have a unique local index before storing it. We also now avoid using the pending hostmap at all for receiving stage1 packets, since we have everything we need to just store the completed handshake.
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Ryan Huber <rhuber@gmail.com>
Co-authored-by: forfuncsake <drussell@slack-corp.com>
We missed this race with #396 (and I think this is also the crash in
issue #226). We need to lock a little higher in the getOrHandshake
method, before we reset hostinfo.ConnectionInfo. Previously, two
routines could enter this section and confuse the handshake process.
This could result in the other side sending a recv_error that also has
a race with setting hostinfo.ConnectionInfo back to nil. So we make sure
to grab the lock in handleRecvError as well.
Neither of these code paths are in the hot path (handling packets
between two hosts over an active tunnel) so there should be no
performance concerns.