Does anyone know of the capacity requirements of t...
# kolide
e
Does anyone know of the capacity requirements of the Mysql DB? Particularly when we get to the 10s of thousands of osquery agents connected to the fleet cluster. How many reads/writes per second will the Mysql db need to perform?
s
It’s pretty low for my environments, but they’re for less than one hundred hosts. Typically 0rps and 2-3wps per host.
e
Awesome, thank you!! 0rps seems like, the DB is only used to register new hosts? Some amount of read activity needs to happen though?
n
Even for large number of hosts, it's hard to say. It will heavily depend on where you send results, if you're doing distributed queries, checkin frequency, etc. There's no one-size-fits-all answer
Our deployment is on the order of tens of thousands of hosts, and is probably doing a lot more than 2-3rps 🤷
I've also found distributed queries pretty much unusable for large numbers of hosts, but I view it as an architectural issue. Have raised tickets in the past about it
e
Thank @nyanshak. I thought only redis was used for distributed queries.
n
I don't know exactly how the distributed querying works, but I've seen that it basically sets fleet on fire, including the database. Redis is pretty chill about the whole affair though 🤷
e
interesting. Scheduled queries don't set fleet on fire? Why would distributed (I'm assuming distributed query = ad-hoc query) be any different?
n
So this also (I think) depends on your log output settings. In our env, we log to Kinesis, so scheduled query results don't hit fleet at all.
I think that redis is used but I'm pretty sure there is also some writing to the DB when you do distributed queries
e
So you osquery agents log directly to kinesis, which reduces the load on the fleet servers?
The biggest issue seems to be that it's impossible to horizontally scale the Mysql dependency. Only one endpoint is allowed, which means we can't separate reads/writes. If I could separate those, I could have different fleet servers point to different Mysql read replicas
z
MySQL is used to determine which queries need to run on a host. Redis is used to stream the results from horizontally scaled Fleet servers to the server communicating with the API/UI client. I'm in the middle of a big refactor that pushes a lot more work onto Redis and will be getting that out ASAP.
👍 2
e
@zwass awesome, thank you! Any insight into how to separate Mysql nodes into read/write endpoints? Or is this something that should not be necessary?
z
I would expect this to no longer be necessary, but let's talk about it if it still is after the refactor.
👍 1