55 Commits

Author SHA1 Message Date
Wade Simmons
ae9de47dd9 Merge remote-tracking branch 'origin/master' into multiport 2025-07-11 12:57:52 -04:00
Nate Brown
52623820c2
Drop inactive tunnels (#1427) 2025-07-03 09:58:37 -05:00
Wade Simmons
b8ea55eb90
optimize usage of bart (#1395)
Some checks failed
gofmt / Run gofmt (push) Successful in 9s
smoke-extra / Run extra smoke tests (push) Failing after 19s
smoke / Run multi node smoke test (push) Failing after 1m19s
Build and test / Build all and test on ubuntu-linux (push) Failing after 18m41s
Build and test / Build and test on linux with boringcrypto (push) Failing after 2m47s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m47s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
Use `bart.Lite` and `.Contains` as suggested by the bart maintainer:

- 9455952eed (commitcomment-155362580)
2025-04-18 12:37:20 -04:00
John Maguire
d4a7df3083
Rename pki.default_version to pki.initiating_version (#1381)
Some checks failed
gofmt / Run gofmt (push) Successful in 9s
smoke-extra / Run extra smoke tests (push) Failing after 20s
smoke / Run multi node smoke test (push) Failing after 1m26s
Build and test / Build all and test on ubuntu-linux (push) Failing after 21m13s
Build and test / Build and test on linux with boringcrypto (push) Failing after 3m19s
Build and test / Build and test on linux with pkcs11 (push) Failing after 2m47s
Build and test / Build and test on macos-latest (push) Has been cancelled
Build and test / Build and test on windows-latest (push) Has been cancelled
2025-04-07 18:08:29 -04:00
Wade Simmons
f36db374ac Merge remote-tracking branch 'origin/master' into multiport 2025-03-06 16:11:32 -05:00
Nate Brown
d97ed57a19
V2 certificate format (#1216)
Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <jackdoan@rivian.com>
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
2025-03-06 11:28:26 -06:00
Nate Brown
08ac65362e
Cert interface (#1212) 2024-10-10 18:00:22 -05:00
Wade Simmons
dabce8a1b4 1.9.4 Release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEnN7QnoQoG72upUfo5qM118W2lxoFAmbfOr4ACgkQ5qM118W2
 lxoTGQ//SKoaiZwbtWZtEjYWUJPxGL5gbidmqdmtT9b0ttBK+ufRRbRQXeuXv+pY
 KlKE3YxS8aWbW+YPvtQ7Ly6W4KoJ49esZYnFRMwnLnOpJY9KXtWe0ej+ohQIqm0g
 R/7MFx9YiKsO+oNI3Bk8Flfkdhh2RCSECO/i5V0oZIkZHy3ceeM/EAlMXy2slC7Z
 jcDLKkHsDSTkNhuCiNFwR8t04y2sZhYXPDC3xG/9FzO8dlstj6Kj7L0E7uceb3yP
 9LlmnQB8AAXQ/ZpJ82Roe72ORGuL5xwUPDpEPKnM2090h6skIA9cpIn4BpRpg/6S
 rrZb/fSIjLlE8YnkA39kKnMS1SW5O2EXSDtXCzEkZI40vGHIJiVY2j+mELqHiWLf
 8MLVC0qW2DvOMA28ZAipQ2gG9txxuArLBD/Zlhtlzn4KeP8m1Dnnv1kkL8z8+H+6
 18zM9lcE4xK8ET+9yao5yNpYinhwEHQnekeevMBJPrI/5SQxkb53u+FXeg1eGAbK
 IewcLlpxun/IwL8D0NwY2/1EVlemupEed9geHDBIjM9gPmBG/zYJdRvh2aLUXcti
 C5nxXAXUknXYAyUwT2kvplLyj1yZheA9nDonIVI9GY1nyZmzWsT0D7BSoOGxw+6H
 4nhcsQfHpEVQvCfY9G2wOvmqiZEkbFDho/3o7hebowkFljXXcKU=
 =IC32
 -----END PGP SIGNATURE-----

Merge tag 'v1.9.4' into multiport

1.9.4 Release
2024-09-13 10:17:59 -04:00
Nate Brown
e264a0ff88
Switch most everything to netip in prep for ipv6 in the overlay (#1173) 2024-07-31 10:18:56 -05:00
Wade Simmons
659d7fece6 1.8.2 Release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEnN7QnoQoG72upUfo5qM118W2lxoFAmWcXeYACgkQ5qM118W2
 lxo8yBAAxnMxvP2d2Mu2n6SExRxqmK5e+CddM0XWNZQzTXO1gyKw7YPLzzQwRPTa
 mhmuGEmqjmG0/VXwz9dl1jrpIJu0ge7APgIn9duFzz5HYnDbb+6+T0cQ/8LQbNe1
 i+xGdY3n1RYHKoeqOi14lmf9uB6zrklfhzFG/05AyYjNNipMtAsC82FrFmySTQ9w
 gp4XGwK5edzWSrBZ0w4nbo8G8r4mP/2qZdbxY+9g9IrrQoeoZtWVttdZ36rkEvIi
 uzyj//PClLTTrAiSHcWdrdPHlLj2L4t1S0ixjnAk2OO/OD/EQ5FwtYggF+x+YE6N
 fedIcUliJNidK7FZ+cWUdB6tUWgjM9TsbfuPoCI786e1OnBRML5ZPCiXZpzhxMWZ
 l+uKJkOUqoC7Nu83+WoedLrJo5zwOhq8oYx0/BVw8dNMdYFGSPrbE3ooFtgUc6Lu
 2TEtD5NzVz6nPAyPOYVNOw726J19fFBKbBZsV12KSTW1ElFafEDCHGelIf2wt8mI
 t23SlYfHMJOhKPMnJWczAFsuVDfMmt5xRvH1mFORiBIm/4EXYIS00IEGKQYuC7m+
 lUmdrk9R6pVdq5lekL1KkB/fjGI/mg5liYY0ubx/4oeHXRyMPXeVY0ZkTqc2PPHi
 7wl2iLytG/FTMdGPC4F4LmXT9xPRzTGNpANItael2PTSBPThQb8=
 =XsOf
 -----END PGP SIGNATURE-----

Merge tag 'v1.8.2' into multiport

1.8.2 Release
2024-01-26 10:45:15 -05:00
Ben Ritcey
01cddb8013
Added firewall.rules.hash metric (#1010)
* Added firewall.rules.hash metric

Added a FNV-1 hash of the firewall rules as a Prometheus value.

* Switch FNV has to int64, include both hashes in log messages

* Use a uint32 for the FNV hash

Let go-metrics cast the uint32 to a int64, so it won't be lossy
when it eventually emits a float64 Prometheus metric.
2023-11-28 11:56:47 -05:00
Nate Brown
3356e03d85
Default pki.disconnect_invalid to true and make it reloadable (#859) 2023-11-13 12:39:38 -06:00
Wade Simmons
f2aef0d6eb Merge remote-tracking branch 'origin/master' into multiport 2023-10-27 08:48:13 -04:00
Nate Brown
5a131b2975
Combine ca, cert, and key handling (#952) 2023-08-14 21:32:40 -05:00
Nate Brown
223cc6e660
Limit how often a busy tunnel can requery the lighthouse (#940)
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-08-08 13:26:41 -05:00
Nate Brown
a3e59a38ef
Use registered io on Windows when possible (#905) 2023-07-10 12:43:48 -05:00
Nate Brown
3bbf5f4e67
Use an interface for udp conns (#901) 2023-06-14 10:48:52 -05:00
Wade Simmons
0e593ad582 Merge branch 'master' into multiport 2023-05-09 15:37:30 -04:00
Nate Brown
03e4a7f988
Rehandshaking (#838)
Co-authored-by: Brad Higgins <brad@defined.net>
Co-authored-by: Wade Simmons <wadey@slack-corp.com>
2023-05-04 15:16:37 -05:00
Wade Simmons
0b67b19771
add boringcrypto Makefile targets (#856)
This adds a few build targets to compile with `GOEXPERIMENT=boringcrypto`:

- `bin-boringcrypto`
- `release-boringcrypto`

It also adds a field to the intial start up log indicating if
boringcrypto is enabled in the binary.
2023-05-04 15:42:45 -04:00
Wade Simmons
28ecfcbc03 Merge remote-tracking branch 'origin/master' into multiport 2023-05-03 10:50:06 -04:00
brad-defined
9b03053191
update EncReader and EncWriter interface function args to have concrete types (#844)
* Update LightHouseHandlerFunc to remove EncWriter param.
* Move EncWriter to interface
* EncReader, too
2023-04-07 14:28:37 -04:00
Wade Simmons
6685856b5d
emit certificate.expiration_ttl_seconds metric (#782) 2023-04-03 20:18:16 -05:00
Wade Simmons
e71059a410 Merge remote-tracking branch 'origin/master' into multiport 2023-04-03 11:30:41 -04:00
Nate Brown
ee8e1348e9
Use connection manager to drive NAT maintenance (#835)
Co-authored-by: brad-defined <77982333+brad-defined@users.noreply.github.com>
2023-03-31 15:45:05 -05:00
Nate Brown
6b3d42efa5
Use atomic.Pointer for certState (#833) 2023-03-30 13:04:09 -05:00
Wade Simmons
aec7f5f865 Merge remote-tracking branch 'origin/master' into multiport 2023-03-13 15:07:32 -04:00
Wade Simmons
9af242dc47
switch to new sync/atomic helpers in go1.19 (#728)
These new helpers make the code a lot cleaner. I confirmed that the
simple helpers like `atomic.Int64` don't add any extra overhead as they
get inlined by the compiler. `atomic.Pointer` adds an extra method call
as it no longer gets inlined, but we aren't using these on the hot path
so it is probably okay.
2022-10-31 13:37:41 -04:00
Wade Simmons
326fc8758d Support multiple UDP source ports (multiport)
The goal of this work is to send packets between two hosts using more than one
5-tuple. When running on networks like AWS where the underlying network driver
and overlay fabric makes routing, load balancing, and failover decisions based
on the flow hash, this enables more than one flow between pairs of hosts.

Multiport spreads outgoing UDP packets across multiple UDP send ports,
which allows nebula to work around any issues on the underlay network.
Some example issues this could work around:

- UDP rate limits on a per flow basis.
- Partial underlay network failure in which some flows work and some don't

Agreement is done during the handshake to decide if multiport mode will
be used for a given tunnel (one side must have tx_enabled set, the other
side must have rx_enabled set)

NOTE: you cannot use multiport on a host if you are relying on UDP hole
punching to get through a NAT or firewall.

NOTE: Linux only (uses raw sockets to send). Also currently only works
with IPv4 underlay network remotes.

This is implemented by opening a raw socket and sending packets with
a source port that is based on a hash of the overlay source/destiation
port. For ICMP and Nebula metadata packets, we use a random source port.

Example configuration:

    multiport:
      # This host support sending via multiple UDP ports.
      tx_enabled: false

      # This host supports receiving packets sent from multiple UDP ports.
      rx_enabled: false

      # How many UDP ports to use when sending. The lowest source port will be
      # listen.port and go up to (but not including) listen.port + tx_ports.
      tx_ports: 100

      # NOTE: All of your hosts must be running a version of Nebula that supports
      # multiport if you want to enable this feature. Older versions of Nebula
      # will be confused by these multiport handshakes.
      #
      # If handshakes are not getting a response, attempt to transmit handshakes
      # using random UDP source ports (to get around partial underlay network
      # failures).
      tx_handshake: false

      # How many unresponded handshakes we should send before we attempt to
      # send multiport handshakes.
      tx_handshake_delay: 2
2022-10-17 12:58:06 -04:00
Wade Simmons
7b9287709c
add listen.send_recv_error config option (#670)
By default, Nebula replies to packets it has no tunnel for with a `recv_error` packet. This packet helps speed up re-connection
in the case that Nebula on either side did not shut down cleanly. This response can be abused as a way to discover if Nebula is running
on a host though. This option lets you configure if you want to send `recv_error` packets always, never, or only to private network remotes.
valid values: always, never, private

This setting is reloadable with SIGHUP.
2022-06-27 12:37:54 -04:00
brad-defined
1a7c575011
Relay (#678)
Co-authored-by: Wade Simmons <wsimmons@slack-corp.com>
2022-06-21 13:35:23 -05:00
Nate Brown
78d0d46bae
Remove WriteRaw, cidrTree -> routeTree to better describe its purpose, remove redundancy from field names (#582) 2021-11-12 12:47:09 -06:00
Nate Brown
88ce0edf76
Start the overlay package with the old Inside interface (#576) 2021-11-10 21:52:26 -06:00
CzBiX
16be0ce566
Add Wintun support (#289) 2021-11-08 12:36:31 -06:00
Nate Brown
bcabcfdaca
Rework some things into packages (#489) 2021-11-03 20:54:04 -05:00
brad-defined
6ae8ba26f7
Add a context object in nebula.Main to clean up on error (#550) 2021-11-02 13:14:26 -05:00
Donatas Abraitis
32e2619323
Teardown tunnel automatically if peer's certificate expired (#370) 2021-10-20 13:23:33 -05:00
Wade Simmons
44cb697552
Add more metrics (#450)
* Add more metrics

This change adds the following counter metrics:

Metrics to track packets dropped at the firewall:

    firewall.dropped.local_ip
    firewall.dropped.remote_ip
    firewall.dropped.no_rule

Metrics to track handshakes attempts that have been initiated and ones
that have timed out (ones that have completed are tracked by the
existing "handshakes" histogram).

    handshake_manager.initiated
    handshake_manager.timed_out

Metrics to track when cached_packets are dropped because we run out of
buffer space, and how many are sent once the handshake completes.

    hostinfo.cached_packets.dropped
    hostinfo.cached_packets.sent

This change also notes how many cached packets we have when we log the
final "Handshake received" message for either stage1 for stage2.

* separate incoming/outgoing metrics

* remove "allowed" firewall metrics

We don't need this on the hotpath, they aren't worh it.

* don't need pointers here
2021-04-27 22:23:18 -04:00
brad-defined
17106f83a0
Ensure the Nebula device exists before attempting to bind to the Nebula IP (#375) 2021-04-16 10:34:28 -05:00
Nathan Brown
64d8e5aa96
More LH cleanup (#429) 2021-04-01 10:23:31 -05:00
Nathan Brown
883e09a392
Don't use a global ca pool (#426) 2021-03-29 12:10:19 -05:00
Nathan Brown
3ea7e1b75f
Don't use a global logger (#423) 2021-03-26 09:46:30 -05:00
Wade Simmons
d604270966
Fix most known data races (#396)
This change fixes all of the known data races that `make smoke-docker-race` finds, except for one.

Most of these races are around the handshake phase for a hostinfo, so we add a RWLock to the hostinfo and Lock during each of the handshake stages.

Some of the other races are around consistently using `atomic` around the `messageCounter` field. To make this harder to mess up, I have renamed the field to `atomicMessageCounter` (I also removed the unnecessary extra pointer deference as we can just point directly to the struct field).

The last remaining data race is around reading `ConnectionInfo.ready`, which is a boolean that is only written to once when the handshake has finished. Due to it being in the hot path for packets and the rare case that this could actually be an issue, holding off on fixing that one for now.

here is the results of `make smoke-docker-race`:

before:

    lighthouse1: Found 2 data race(s)
    host2:       Found 36 data race(s)
    host3:       Found 17 data race(s)
    host4:       Found 31 data race(s)

after:

    host2: Found 1 data race(s)
    host4: Found 1 data race(s)

Fixes: #147
Fixes: #226
Fixes: #283
Fixes: #316
2021-03-05 21:18:33 -05:00
Nathan Brown
b6234abfb3
Add a way to trigger punch backs via lighthouse (#394) 2021-03-01 19:06:01 -06:00
Wade Simmons
2a4beb41b9
Routine-local conntrack cache (#391)
Previously, every packet we see gets a lock on the conntrack table and updates it. When running with multiple routines, this can cause heavy lock contention and limit our ability for the threads to run independently. This change caches reads from the conntrack table for a very short period of time to reduce this lock contention. This cache will currently default to disabled unless you are running with multiple routines, in which case the default cache delay will be 1 second. This means that entries in the conntrack table may be up to 1 second out of date and remain in a routine local cache for up to 1 second longer than the global table.

Instead of calling time.Now() for every packet, this cache system relies on a tick thread that updates the current cache "version" each tick. Every packet we check if the cache version is out of date, and reset the cache if so.
2021-03-01 19:52:17 -05:00
Wade Simmons
d232ccbfab
add metrics for the udp sockets using SO_MEMINFO (#390)
Retrieve the current socket stats using SO_MEMINFO and report them as
metrics gauges. If SO_MEMINFO isn't supported, we don't report these metrics.
2021-03-01 19:51:33 -05:00
Nathan Brown
ecfb40f29c
Fix osx for mq changes, this does not implement mq on osx (#395) 2021-03-01 16:57:05 -05:00
Wade Simmons
27d9a67dda
Proper multiqueue support for tun devices (#382)
This change is for Linux only.

Previously, when running with multiple tun.routines, we would only have one file descriptor. This change instead sets IFF_MULTI_QUEUE and opens a file descriptor for each routine. This allows us to process with multiple threads while preventing out of order packet reception issues.

To attempt to distribute the flows across the queues, we try to write to the tun/UDP queue that corresponds with the one we read from. So if we read a packet from tun queue "2", we will write the outgoing encrypted packet to UDP queue "2". Because of the nature of how multi queue works with flows, a given host tunnel will be sticky to a given routine (so if you try to performance benchmark by only using one tunnel between two hosts, you are only going to be using a max of one thread for each direction).

Because this system works much better when we can correlate flows between the tun and udp routines, we are deprecating the undocumented "tun.routines" and "listen.routines" parameters and introducing a new "routines" parameter that sets the value for both. If you use the old undocumented parameters, the max of the values will be used and a warning logged.

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
2021-02-25 15:01:14 -05:00
Nathan Brown
68e3e84fdc
More like a library (#279) 2020-09-18 09:20:09 -05:00
Wade Simmons
f3a6d8d990
Preserve conntrack table during firewall rules reload (SIGHUP) (#233)
Currently, we drop the conntrack table when firewall rules change during a SIGHUP reload. This means responses to inflight HTTP requests can be dropped, among other issues. This change copies the conntrack table over to the new firewall (it holds the conntrack mutex lock during this process, to be safe).

This change also records which firewall rules hash each conntrack entry used, so that we can re-verify the rules after the new firewall has been loaded.
2020-07-31 18:53:36 -04:00