Hi OpenPlgx 1 We already filter all what is possible to Ther osquery #eclecticiq-polylogyx-extension

Hi, OpenPlgx! 1. We already filter all what is pos...

Michael

03/23/2021, 3:13 PM

Hi, OpenPlgx! 1. We already filter all what is possible to. There is one more problem - new agents not registering and old are not receiving new config from server. 2. We use shallow config.

moulik

03/23/2021, 4:32 PM

Can you please share the server specs and the server resource utilization(avg cpu and ram usage). Also, how many endpoints you have enrolled?

moulik

03/23/2021, 5:03 PM

Can you also share the details for the hosts last seen/activity which is not refreshing the config To get the details you can click on the host from the hosts page and hover on

Last Seen

Michael

03/23/2021, 5:06 PM

Agent works on Intel Xeon with 2 virtual cores and 4 Gb of RAM. CPU is about 11% in average, RAM about 60%. 9 agent deployed, but only 3 left after control server reinstall. Unfortunately i cannot provide

Last seen

due to 6 of 9 agent are not appeared.

moulik

03/24/2021, 2:38 AM

I mean the ESP application server specs and resource usage. Last seen details for any one of the system would be fine

Michael

03/24/2021, 7:12 AM

Oh, okay. Server also have 2 virtual cores of Intel Xeon Platinum 8000 and 16 GB RAM. One of agents is on screenshot

Kishore Arava

03/24/2021, 12:49 PM

Can you please confirm few things to us to identify why the newly enrolled agents are not showing in ESP UI? 1. The server ip in the agent machine's osquery flags file(C:\Program Files\plgx_osquery\osquery.flags) with the flag

--tls_hostname

is same as your control server(ESP)? 2. Certificate(plgx-esp/nginx/certificate.crt) of the server is matching with the certificate you are trying to enroll the agent?

👍 1

Michael

03/24/2021, 1:12 PM

Hi, @Kishore Arava! 1. It's very embarrasing, but no - there was localhost. Now it works, thanks. There left one more question - memory leaks or smth like that causing RDP fails.

Kishore Arava

03/24/2021, 2:05 PM

Yes. Might be. Hope agents are reading the updated config from server now, Based on the last screenshot you shared.

OpenPlgx

03/24/2021, 2:08 PM

Is the memory usage coming down in time? I mean is it a spike..i dont think there is a leak here..it could be due to high event load on the system

Michael

03/24/2021, 2:40 PM

Memory usage going like this in time. And sometimes server going unresponsive. Unfortunately i couldn't look at server in failed state due to failed RDP. Now we exclude all we can to still be secured with.

Michael

03/24/2021, 2:41 PM

Current state of one of agents, if you interested in.

Michael

03/24/2021, 2:41 PM

I'm using Process Explorer to monitor situation

OpenPlgx

03/25/2021, 4:03 AM

I can agree that 120 Mb is a a little high (and that also seemed like a spike rather than being consistent) but not terrible for a server work load..what kind of physical RAM you have on these server machines running the agent? I am surprised that a 120 Mb RAM is causing RDP failures

Michael

03/25/2021, 6:57 AM

Unfortunately i couldn't tell about physical RAM because all our servers in virtual environment. 120 Mb is not maximum. Two days ago i see 486 Mb.

Michael

03/25/2021, 7:15 AM

One of a test servers now

OpenPlgx

03/25/2021, 7:43 AM

I understand. I was curious to know if these were transient conditions or were they steady state...for e.g. as I read, few of your servers (endpoints) were not connected with the ESP-server due to an error in the flagsfile. But this wouldnt stop the agent from collecting/caching the data in the meantime and the moment connectivity would resume, it would try to pump all the data back to ESP-Server causing a temporary spike.. is that what we are seeing here (or any such transient condition), is what I am trying to understand.., other reason could be the volume of activity which can be controlled thru tuning event filters specific to your environment, but for us to guide you on that would require you to be able to share some data..

Michael

03/25/2021, 7:59 AM

What kind of a data do you need?

OpenPlgx

03/25/2021, 10:53 AM

Recent Activity data on the system exhibiting the persistent high RAM usage + the plgx_event_filters config

Michael

03/25/2021, 12:05 PM

There is almost no activity on most RAM consuming machine because of no user activity except of constantly working Process Explorer, which is excluded in agent configuration. Can exclusions count affect on a load?

Michael

03/25/2021, 12:07 PM

Plgx event filter config i could not provide due to security reaseons, sorry. We exclude our antivirus, SIEM, Plgx itself, some Microsoft utilities and frameworks. Excluded ports are standart.

Michael

03/25/2021, 12:07 PM

I constantly monitor events and add exclusions for non-critical mass events.

Michael

03/25/2021, 1:05 PM

Right now

plgx-osqueryd.exe

failed with the next error and can't start. RAM is 98% and except this server is fine.

Michael

03/25/2021, 1:06 PM

May be here is some dump or smth like this? I'm mostly works with Linux and Windows services debugging is new for me 😅

Michael

03/25/2021, 1:12 PM

I used this for second server which is close to crash. Waiting now https://stackoverflow.com/questions/27704301/generate-memory-dump-for-a-windows-service-that-stops-unexpectedly

OpenPlgx

03/25/2021, 2:05 PM

This error is usually benign (it comes from osquery) but I think what might be happening on this system is for some reason, the benign error is acting up...are you seeing many such errors, as if it were in a loop?

Michael

03/25/2021, 2:08 PM

Yes, there is a bunch of such errors. Procdump didn't work, by the way 😞

Michael

04/01/2021, 9:07 AM

Hi, guys! Are there any news? We still need some solution

OpenPlgx

04/01/2021, 1:10 PM

Without having access to data, it is indeed difficult to suggest options...here is one thing you could try to get rid of this restart loop (which I believe is happening due to some reason osquery being acting up on this system)

OpenPlgx

04/01/2021, 1:11 PM

from an admin command prompt, run following commands to stop the services 1. sc stop plgx_cpt (wait for 15-30 seconds & run sc query plgx_cpt to make sure it has stopped) 2. sc stop plgx_osqueryd (wait for 15-30 seconds and ensure it has stopped as well) 3. sc stop vast/sc delete vast 4. sc stop vastnw/sc delete vastnw 5. sc start plgx_osqueryd

OpenPlgx

04/01/2021, 1:12 PM

do not start plgx_cpt (that's an outside monitoring service causing the trigger)

Michael

04/01/2021, 3:48 PM

Hi, @OpenPlgx! Thanks, i will check this now

Michael

04/01/2021, 4:00 PM

Okay, i've done this. I shouldn't start plgx_cpt at all? Now we could only wait for failures. Also, i've get this, is it expected?

himanshu

04/02/2021, 4:52 AM

this means agent hasn't received server port from Options UI

custom_plgx_ServerPort

. Can you run plgx_osqueryd from command line as below and share any errors/warnings you see on CLI from polylogyx osquery extension? 1. run 'sc stop plgx_osqueryd' 2. in osquery.flags file, rename the osquery db file name to something else, say

--database_path=C:\Program Files\plgx_osquery\osquery_temp.db

3. from CLI, run 'plgx_osqueryd.exe --flagfile osquery.flags --verbose' 4. Note any warnings/errors.

OpenPlgx

04/02/2021, 11:16 AM

@Michael, these errors shouldn't cause a functional issue...We can bury them later...but lets see if you still get the high RAM/CPU usage that you mentioned about (or the osquery agent start/stop situation)

Michael

04/05/2021, 9:05 AM

Hi, guys! Thanks for replies, i very appreciate it. So, on the server where i manually stopped all, removed vast and vastnw, then started plgx_osqueryd service has failed after about 2 days of work. On the server was no activity at all. In current state it still doesn't work. Memory consumed for 97%. It looks like memory leak, but i can't catch anything. All that i've got - without Polylogyx ESP or Polymon all works well

Michael

04/05/2021, 9:06 AM

Ah, yes - we tried to test Polymon and it cause the same problems.

Michael

04/05/2021, 9:17 AM

@himanshu Here it is.

OpenPlgx

04/06/2021, 8:54 AM

@Michael, the images you have added seems to suggest the memory outage is in windows defender and/or FireFox

OpenPlgx

04/06/2021, 8:55 AM

The agent is taking a very minimum of 7 MB.

OpenPlgx

04/06/2021, 8:56 AM

That said, I can understand that the issue is appearing only when you deploy the agent..& I am trying to think how else we can support you given that there isn't a lot of data you can share

Michael

04/06/2021, 8:57 AM

Hi, OpenPlgx! Yes, agent taking 7 Mb cause it's dead. This is only one case, i haven't seen such error on DC servers, for example, which is Core server without Firefox 🙂 I can clean sensitive data from config and share it, if you need to.

OpenPlgx

04/06/2021, 8:58 AM

what do you mean its dead? Its showing in the taskmgr, right?

OpenPlgx

04/06/2021, 8:58 AM

Are you saying such errors are not seen on servers that don't have firefox?

OpenPlgx

04/06/2021, 8:59 AM

Also a bit confused; if the agent is dead, and then also the RAM usage is going to ~97%??

Michael

04/06/2021, 8:59 AM

I think it's dead cause it doesn't collect any information - last log events was about two days ago.

OpenPlgx

04/06/2021, 8:59 AM

ok, so no queries are active..

Michael

04/06/2021, 9:00 AM

It's confusing me too. From my point of view it looks like memory leak.

OpenPlgx

04/06/2021, 9:01 AM

leak where? if it was leak in the agent, taskmgr would show up it in agent..so its not a leak

Michael

04/06/2021, 9:02 AM

No, errors about VRAM are not appearing anywhere, this is only case

Michael

04/06/2021, 9:02 AM

So, do you need a config?

OpenPlgx

04/06/2021, 9:02 AM

yes, but can I request something simpler before?

OpenPlgx

04/06/2021, 9:03 AM

lets start fresh from very basics of osquery (should take about 15-20 mins) if that's ok with you

Michael

04/06/2021, 9:04 AM

Sure, np. What should i do?

OpenPlgx

04/06/2021, 9:04 AM

unistall the agent, completely from the server showing up the issue

OpenPlgx

04/06/2021, 9:05 AM

you can do that from an admin command line by running "plgx_cpt -u d";

Michael

04/06/2021, 9:06 AM

Okay, wait a minute, please

OpenPlgx

04/06/2021, 9:06 AM

(plgx_cpt being the tool you must have downloaded to install the agent, from the server)

Michael

04/06/2021, 9:10 AM

Done. Do you need the log?

OpenPlgx

04/06/2021, 9:11 AM

no, looks good

OpenPlgx

04/06/2021, 9:12 AM

just give me the output of following commands: 1. sc query plgx_osqueryd 2. sc query vast 3. sc query vastnw

Michael

04/06/2021, 9:13 AM

Done

OpenPlgx

04/06/2021, 9:15 AM

oh, you are running from powershell, then perhaps: 1. Get-Service plgx_osqueryd 2. Get-Service vast 3. Get-Service vastnw

OpenPlgx

04/06/2021, 9:15 AM

expected output "Get-Service : Cannot find any service with service name <service name>"

OpenPlgx

04/06/2021, 9:16 AM

after that please install osquery from its website: https://osquery.io/downloads/official/4.7.0

Michael

04/06/2021, 9:18 AM

done

OpenPlgx

04/06/2021, 9:27 AM

can you adjust the osquery.conf to queries of your interest? (you wont be able to run any query for win_* _events table yet.)

OpenPlgx

04/06/2021, 9:28 AM

I want to see if the base osquery and its queries work fine on your system first, before we go with PolyLogyx additional stuff

OpenPlgx

04/06/2021, 9:28 AM

You might have to stop the osqueryd first, then edit the .conf file and then start it again

Michael

04/06/2021, 9:30 AM

Okay. It take some time cause we have several packs which contains both standard osquery and plgx stuff,

Michael

04/06/2021, 9:30 AM

I will report as soon as i can, thanks 🙂

OpenPlgx

04/06/2021, 9:45 AM

cool

OpenPlgx

04/06/2021, 9:45 AM

once we see this working, then we will manually add PolyLogyx Extension as well, then let it run and then add the packs..

OpenPlgx

04/06/2021, 9:46 AM

in the mean time, can you share your packs/configs?

Michael

04/06/2021, 9:50 AM

Let me review them for sensitive data and i will share

OpenPlgx

04/06/2021, 10:00 AM

sure

Michael

04/06/2021, 10:14 AM

Plgx_default.conf

Michael

04/06/2021, 1:00 PM

I've add and run all packs we have for osquery

Michael

04/07/2021, 7:32 AM

Hi, OpenPlgx! Looks like osqueryd works well

OpenPlgx

04/07/2021, 11:06 AM

Great. So along with the .conf file, can you also share what queries/query packs you were running?

Michael

04/07/2021, 11:09 AM

Sure. Here it is

Windows.zip

OpenPlgx

04/07/2021, 11:18 AM

it doesn't have any queries/packs for polylogyx tables?

Michael

04/07/2021, 11:21 AM

no, it doesn't

OpenPlgx

04/07/2021, 11:31 AM

but you get into situation when you enable queries on PolyLogyx tables, right?

Michael

04/07/2021, 11:34 AM

Looks like. Most consuming process was

plgx_osqueryd.exe

Michael

04/07/2021, 1:07 PM

So what is our next step? Extension?

OpenPlgx

04/07/2021, 2:05 PM

that is just vanilla osquery renamed to plgx_osqueryd

OpenPlgx

04/07/2021, 2:05 PM

i wanted to see the queries you have for PolyLogyx tables

OpenPlgx

04/07/2021, 2:06 PM

and yes, the next step would be to first load the extension (without any queries)

Michael

04/07/2021, 2:11 PM

Do you mean these?

Michael

04/07/2021, 2:12 PM

How should i apply extension?

OpenPlgx

04/07/2021, 2:29 PM

1. Download the latest extension binary by cloning: https://github.com/polylogyx/osq-ext-bin 2. stop the osquery service on your system 3. move the file plgx_win_extension.ext.exe, extensions.load, osquery.flags in c:\program files\osquery 4. Adjust your osquery.conf to apply all the filters (based on the above) 5. from an admin command prompt, run notepad and change osquery.flags file's following line: --database_path=C:\Program Files\osquery\osquery.db to --database_path=C:\Program Files\osquery\osquery1.db 6. restart the osquery service 7. from the admin prompt, check to see vast/vasntw service are running. In case not, stop & start the osquery service again

Michael

04/07/2021, 2:33 PM

Thanks, i will

Michael

04/07/2021, 2:54 PM

Looks like it works.

OpenPlgx

04/07/2021, 5:19 PM

great, lets see if you hit the memory usage issue

OpenPlgx

04/07/2021, 5:20 PM

(also, can you make an exception for plgx_win_extension.ext.exe and osqueryd.exe in Windows Defender)?

Michael

04/07/2021, 6:07 PM

We do not use Defender, and exclusions in our anti-virus are implemented

OpenPlgx

04/08/2021, 3:32 AM

Your earlier images/screen shots showed MsMpEng.exe running and consuming the highest RAM in the system ...MsMpEng.exe is the Windows Defender Engine

OpenPlgx

04/08/2021, 3:34 AM

OpenPlgx

04/08/2021, 3:34 AM

From your earlier post ☝️

Michael

04/08/2021, 7:09 AM

Oh, yes, my bad. Defender is working on the one of the test servers. Now we testing on servers without Defender

Michael

04/09/2021, 7:12 AM

Hi, OpenPlgx! Looks like manually deployed extension works fine. I'm extremely surprised

Michael

04/09/2021, 7:12 AM

What is our next step?

OpenPlgx

04/09/2021, 8:59 AM

that is indeed surprising ..there is really no diff in what we did manually vs what would have happened thru the CPT tool

Michael

04/09/2021, 9:05 AM

I understand, but all what happens have no logic at all. On tests before massive deployment all works perfectly. Then was deployment and problems with RAM and RDP have appeared. Two freaking weeks of collecting information and tries to understand what's going on and now - this. All just works again. I think about vast and vastnw - they are slightly relatives to Sysmon driver, and Sysmon once cause same problems, but long ago. Could it be possible?

Michael

04/09/2021, 12:55 PM

Right now failed one of the Plgx test servers. In this moment there were errors related to SSL and only they. I see them not for the first time, but did not pay attention previously. Could it be somehow related to failure? Plgx processes are alive, just not collecting anything

OpenPlgx

04/09/2021, 5:58 PM

vast/vastnw are indeed sysmon-ish drivers....

OpenPlgx

04/09/2021, 5:59 PM

PolyLogyx processes are collecting..go to the 'Details' tab

OpenPlgx

04/09/2021, 6:00 PM

I dont think the SSL certs are related ..although what is the issue in these 3 images?? the resource usage seem fairly moderate to me

Michael

04/09/2021, 7:06 PM

No, it's stop collect, i know details in other tab. When i send message, it doesn't works for about 1.5 hours. Process - alive, but no new events

OpenPlgx

04/11/2021, 2:00 PM

The events are getting collected in the event log all the time (as long as the process is alive in memory). Certain events might get dropped (depending on the filter conditions) but there is no documented way to stop collecting the events...

8 Views

Open in Slack

Previous Next