Commit Graph

127 Commits

Author SHA1 Message Date
Marcel Märtens
8ee83df600 fixed #1067 2021-04-13 20:39:56 +02:00
Marcel Märtens
9638581d09 ITFrame::read_frame now throws an error when the frame_no is invalid.
This will be catched by the respective protocols, e.g. tcp and cause a violation
2021-04-13 00:13:08 +02:00
Marcel Märtens
4fed9dab83 Have a clear error for when the I/O closes and when some protocol is violated. this should help find the rootcause of a bug.
If its Closed it looks like the TCP connection got dropped/cut off (e.g. OS, Wifi).
If its Violated we for sure know the cause is the messages send/recv in a wrong way
2021-04-13 00:12:20 +02:00
Marcel Märtens
ea5a02d7cd change some Ordering::Relaxed to Ordering::SeqCst when we do not want to have it moved/or taken effects from other threads.
some id increases are kept Relaxed, SeqCst shouldn't be necessary there.
Not sure about the bool checks in api.rs
2021-04-07 23:17:09 +02:00
Marcel Märtens
7ca2f3b9d6 make a panic a error! and improve logging 2021-04-03 19:58:36 +02:00
Marcel Märtens
c77446a014 fix some tracy only and no default features 2021-03-27 18:24:10 +01:00
Marcel Märtens
6a95fb6b74 extend network of incomming TCP metrics and failed handshake metric 2021-03-27 18:24:10 +01:00
Marcel Märtens
aea52d8b54 implement Upload Bandwidth prediction.
Its available to `api` and `metrics` and can be used to slow down msg send in veloren.
It uses a tokio::watch for now, as i plan to have a watch job in the scheduler that recalculates prio on change.
Also cleaning up participant metrics after a disconnect
2021-03-26 08:58:03 +01:00
Marcel Märtens
034d0f0c5d fix a bug that some closes could get lost 2021-03-26 08:57:56 +01:00
Marcel Märtens
8dccc21125 preparation for multiple-channel participants.
When a stream is opened we are searching for the best (currently) available channel.
The stream will then be keept on that channel.
Adjusted the rest of the algorithms that they now respect this rule.
improved a HashMap for Pids to be based on a Vec. Also using this for Sid -> Cid relation which is more performance critical
WARN: our current send()? error handling allows it for some close_stream messages to get lost.
2021-03-26 08:57:50 +01:00
Marcel Märtens
01c82b70ab network scheduler and rawmsg cleanup 2021-03-26 08:57:42 +01:00
Marcel Märtens
6b23101fac update toolchain to nightly-2021-03-22 2021-03-22 16:41:04 +01:00
Marcel Märtens
9084ac48f1 defer some trace, so that we wont spam the log. 2021-03-22 09:16:07 +01:00
Marcel Märtens
18aefa8dbf remove prometheus dos via protocol.
Clean up unused labely to keep prometheus values down
2021-03-20 23:50:38 +01:00
Marcel Märtens
d7df741671 update dependencies, including removal of some tracy deps as they are get through common/tracy 2021-03-09 20:17:29 +01:00
Marcel Märtens
9028578bc8 Change the way Network is dropped.
Instead of keeping Runtime and manually spawn a task on `drop` this task is spawned at start and will wait to be triggered.
The `drop` methods then wait for completion, UNLESS they are in a async context, then they MUST NOT BLOCK (deadlock potential), so they defer it to the Runtime and HOPE for the runtime to exist long enough.
This get rid of the weird `block_in_place` which is only accessable with `rt-multi-threaded` and has some disadvantages.
We also wont requiere the runtime to be active all the time. Though its needed for a clean shutdown
2021-03-03 11:28:40 +01:00
Marcel Märtens
44817870ee Shutdown improvements
- Timeout for Participant::drop, it will stop eventually
 - Detect tokio runtime in Participant::drop and no longer use std::sleep in that case (it could hang the thread that is actually doing the shutdown work and deadlock
 - Parallel Shutdown in Scheduler: Instead of a slow shutdown locking up everything we can now shutdown participants in parallel, this should reduces `WARN` part took long for shutdown dramatically
2021-02-26 10:50:30 +01:00
Marcel Märtens
3f5c64bec0 Client::new can now resolve DNS requests, better networking error messages 2021-02-22 17:35:19 +01:00
Marcel Märtens
1a7c179bbb share tokio Runtime between Client and Server, name rayon Threadpool 2021-02-22 17:35:06 +01:00
Marcel Märtens
514d5db038 Update Network Protocol
- now last digit version is compatible 0.6.0 will connect to 0.6.1
 - the TCP DATA Frames no longer contain START field, as it's not needed
 - the TCP OPENSTREAM Frames will now contain the BANDWIDTH field
 - MID is not Protocol internal

Update network
 - update API with Bandwidth

Update veloren
 - introduce better runtime and `async` things that are IO bound.
 - Remove `uvth` and instead use `tokio::runtime::Runtime::spawn_blocking`
 - remove futures_execute from client and server use tokio::runtime::Runtime instead
 - give threads a Name
2021-02-22 17:34:55 +01:00
Marcel Märtens
5a48bffcb0 fix main thread blocking which was a bad combination of
- a channel was stale and wasn't shut down propertly AS WELL AS
 - the msg ingoing pipe was bounded, so it could fill up

To mitigate this we
 a) unbounded the pipe
 b) stoped spam the log in no channel case
 c) instead of ever reaching "no channel" state we actually shutdown participant
 d) when send_mgr is closed it will no longer be able to SEND on streams
2021-02-18 20:00:07 +01:00
Marcel Märtens
03af9937cf Stabelize Network again:
- completly switch to Bytes, even in api. speed up TCP by fak 2
 - improve benchmarks
 - speed up mpsc metrics
 - gracefully handle shutdown by interpreting Ok(0) as tokio::tcpstream closed now.
 - fix hotloop in participants by adding `Some(n)` to fix endless handing.
 - fix closing bug by closing streams after `recv_mgr` is shutdown even if now shutdown is triggered locally.
 - fix prometheus
 - no longer throw when a `Stream` is dropped while participant still receives a msg for it.
 - fix the bandwith handling, TCP network send speed is up to 1.5GiB/s while recv is 150MiB/s
 - add documentation
 - tmp require rt-multi-threaded in client for tokio, to not fail cargo check

this is prob stable, i tested over 1 hour.
after that some optimisations in priomgr.
and impl. propper bandwith.
Speed is up to 2GB/s write and 150MB/s recv on a single core

sync add documentation
2021-02-17 19:37:48 +01:00
Marcel Märtens
ea8ab1ce7a Great improvements to the codebase:
- better logging in network
 - we now notify the send of what happened in recv in participant.
 - works with veloren master servers
 - works in singleplayer, using a actual mid.
 - add `mpsc` in whole stack incl tests
 - speed up internal read/write with `Bytes` crate
 - use `prometheus-hyper` for metrics
 - use a metrics cache
2021-02-17 16:15:00 +01:00
Marcel Märtens
9884019963 COMPLETE REDESIGN of network crate
- Implementing a async non-io protocol crate
    a) no tokio / no channels
    b) I/O is based on abstraction Sink/Drain
    c) different Protocols can have a different Drain Type
       This allow MPSC to send its content without splitting up messages at all!
       It allows UDP to have internal extra frames to care for security
       It allows better abstraction for tests
       Allows benchmarks on the mpsc variant
       Custom Handshakes to allow sth like Quic protocol easily
 - reduce the participant managers to 4: channel creations, send, recv and shutdown.
   keeping the `mut data` in one manager removes the need for all RwLocks.
   reducing complexity and parallel access problems
 - more strategic participant shutdown. first send. then wait for remote side to notice recv stop, then remote side will stop send, then local side can stop recv.
 - metrics are internally abstracted to fit protocol and network layer
 - in this commit network/protocol tests work and network tests work someway, veloren compiles but does not work
 - handshake compatible to async_std
2021-02-17 12:39:47 +01:00
Marcel Märtens
3f85506761 fix most unittests (not all) by a) dropping network/participant BEFORE runtime and by transfering a expect into a warn! in the protocol 2021-02-17 12:38:58 +01:00
Marcel Märtens
5aa1940ef8 get rid of async_std::channel
switch to `tokio` and `async_channel` crate.
I wanted to do tokio first, but it doesnt feature Sender::close(), thus i included async_channel
Got rid of `futures` and only need `futures_core` and `futures_util`.

Tokio does not support `Stream` and `StreamExt` so for now i need to use `tokio-stream`, i think this will go in `std` in the future

Created `b2b_close_stream_opened_sender_r` as the shutdown procedure does not need a copy of a Sender, it just need to stop it.

Various adjustments, e.g. for `select!` which now requieres a `&mut` for oneshots.

Future things to do:
 - Use some better signalling than oneshot<()> in some cases.
 - Use a Watch for the Prio propergation (impl. it ofc)
 - Use Bounded Channels in order to improve performance
 - adjust tests coding

bring tests to work
2021-02-17 12:38:53 +01:00
Marcel Märtens
1b77b6dc41 Initial switch to tokio for network, minimum working example. 2021-02-17 12:37:59 +01:00
Caelan
dda4931f46 Clean and update dependencies
* Remove tweak feature
 * Remove const-tweaker
 * Update tiny_http
 * Update bitvec to 0.21.0
 * Downgrade euc to avoid conflict with vek 0.12.0
 * Require exactly vek 0.12.0
 * Update all other dependencies automatically based on these changes
 * Update gilrs to latest at the request of Ada Lovegirls
 * Update meshing benchmarks to new criterion API
2021-02-17 01:27:06 -08:00
jiminycrick
8b97199245 Update rand dependency 2021-01-26 20:35:08 -08:00
Acrimon
9f16a946ee
update a few deps 2021-01-20 15:53:58 +01:00
Marcel Märtens
6390e758d4 fix clippy in all examples.
added a ignore for plugins, as we cannot remove the `Result<>` type, it is necessary
2021-01-13 15:06:04 +01:00
Marcel Märtens
26918d10c9 update chrossbeam, tracy, prometheus (and reduce server deps to crossbeam-channel) 2020-12-16 00:51:07 +01:00
Joshua Barretto
fc96fd780b Moved plugin API to Uid instead of usize for entity IDs, hello plugin to an example 2020-12-13 23:08:15 +00:00
Joshua Barretto
f8c8e342e6 Moved common networking code to common/net, clippy fixes 2020-12-13 17:23:45 +00:00
Marcel Märtens
7a7c1f6f50 I would except this to be implcitly done by the drop though it doesn't hurt here, as this channel is dropped anyway a line later.
But i have the feeling that maybe something with the channel is wrong which leads to this behavior (or maybe did i made a copy somewhere, though i dobt this).
Again, not sure if this is a fix, but i think it doesn't hurt
2020-11-27 10:47:01 +01:00
Marcel Märtens
1b18b43fa7 Drop is run only AFTER each statement.
And the guard generated here in the if let Some() will live through the WHOLE statement, which is definitly critical.
The fix is quite easy. We just move it out in a seperate line.
There are still some participants where the dropping will fail, and followup disconnects will NOT get triggered and be queued up till ethernity, but at least we are not blocking new logins
2020-11-25 22:10:10 +01:00
Ben Wallis
639281bc32 Replaced usage of 0.0.0.0 with 127.0.0.1 in network tests to prevent firewall prompt on Windows when running tests 2020-11-15 22:10:31 +00:00
Marcel Märtens
40f5afc2b0 ci cleanup, dependency update 2020-11-06 14:34:42 +01:00
Marcel Märtens
153c6c3b13 Fixing Tarpaulin isn't easy.
So first off all we had to update the toolchain, i think everything in september is okay, but we got with this current version.
Then we had to update several dependencies, which broke:
 - need a specific fix of winit, i think we want to get rid of this with iced, hopefully, because its buggy as hell. update wayland client to 0.27
 - use a updated version of glutin which has wayland-client 0.27 and no longer the broke version 0.23
 - update conrod to use modern copypasta 0.7
 - use `packed_simd_2` instead of `packed_simd` as the owner of the create abandoned the project.
 - adjust all the coding to work with the newer glutin and winit version. that also includes fixing a macro in one of the dependencies that did some crazy conversion from 1 event type to another event type.
   It was called `convert_event`
 - make a `simd` feature which is default and introduce conditional compiling.
   AS I HAVE NO IDEA OF SIMD AND THE CODE. AND I DIDN'T INTRODUCE THE ERROR IN THE FIRST PLACE, WE PANIC FOR NON SIMD CASE FOR NOW. BUT IT WORKS FOR TESTS.
 - tarpaulin doesnt support no-default-features. so we have to `sed` them away. semms fair.
2020-10-26 17:04:20 +01:00
Marcel Märtens
b5f48014a9 Streams no longer panic when recv on a StreamClosed Stream. Panicing is a "feature" of futures::channel
Refactor the `send_raw` and `recv_raw` completly. We now expost `Message` which has a public `serialize` and `deseialize` fn for the first time.
This makes using the `raw` methods of a stream much easier and is a requierement for using "copy_less" sending to multiple streams
2020-10-19 10:23:30 +02:00
Marcel Märtens
572b83e262 add a try_recv fn to Stream which is NOT async 2020-10-19 10:23:27 +02:00
Marcel Märtens
ba1299e670 apparently span doesnt work for async, so i replaced it by an instrument version 2020-10-14 17:54:01 +02:00
Marcel Märtens
e914c29728 FIX for hanging participant deletion.
There is a rare bug that recently got triggered more often with the release of xMAC94x/netfixA
if the bug triggeres, a Participant never gets cleaned up gracefully.

Reason:
When `participant_shutdown_mgr` was called it stopped all managers at once.
Especially stream_close_mgr and send_mgr.
The problem with stream_close_mgr is, it's responsible for gracefully flushing streams when the Participant is dropped locally.
So when it was interupted self.streams where no longer flushed gracefully.
The next problem was with send_mgr.
It is triggering the PrioManager, and the PrioManager is responsible for notifying once a stream is completly flushed.
This lead to the problem, that a stream flush could be requested, but was actually never executed (as send_mgr was already down).

Solution:
1. when stream_close_mgr is stopped it MUST flush all remaining streams
2. wait for stream_close_mgr to finish before shutting down the send_mgr
3. no longer delete streams when closing the API (this also wasn't tracked in metrics so far)

Additionally i added a dependency, so that the network/examples compile again, fixed some spelling.
I created a `delete_stream` fn that basically just moved the code over.
2020-10-14 15:03:49 +02:00
Marcel Märtens
24af657fd5 quickfix for closing participants more reliable 2020-10-13 20:06:20 +02:00
Marcel Märtens
9af2fcfe46 add more tracing and drop lock earlier 2020-10-13 19:01:53 +02:00
Ben Wallis
b3dd8e8a02 Added #![deny(clippy::clone_on_ref_ptr)] to all crates and fixed resulting lint errors 2020-09-27 17:25:33 +01:00
Marcel Märtens
a7b7ae3a2c fix compiling with metrics 2020-08-27 09:35:06 +02:00
Imbris
dcce5641f7 Fix broken features and avoid panic if the client leaves before character data loads 2020-08-26 20:47:39 -04:00
Marcel Märtens
9170622611 reduce load on metrics by ALOT!
- first remove participant AND channel in same metric. this caused a matrix full of 0 values which bloated alot.
 - then did the cid cache to be lazy loading to no longer generate that much 0 values
 - possible would also be no longer keeping metrics for INIT, HANDSHAKE and PARTICIPANTID
2020-08-27 01:55:13 +02:00
Marcel Märtens
8c883c200d Switch veloren_network over to use the official example layout.
adjusted those examples to run again
created a CI TEST to always `check` examples
fixed fmt in examples so that pipeline gets green
2020-08-26 10:07:22 +02:00