Title
#eclecticiq-polylogyx-extension
Michael

Michael

03/23/2021, 3:13 PM
Hi, OpenPlgx!1. We already filter all what is possible to. There is one more problem - new agents not registering and old are not receiving new config from server. 2. We use shallow config.
m

moulik

03/23/2021, 4:32 PM
Can you please share the server specs and the server resource utilization(avg cpu and ram usage). Also, how many endpoints you have enrolled?
5:03 PM
Can you also share the details for the hosts last seen/activity which is not refreshing the config To get the details you can click on the host from the hosts page and hover on
Last Seen
Michael

Michael

03/23/2021, 5:06 PM
Agent works on Intel Xeon with 2 virtual cores and 4 Gb of RAM. CPU is about 11% in average, RAM about 60%. 9 agent deployed, but only 3 left after control server reinstall. Unfortunately i cannot provide
Last seen
due to 6 of 9 agent are not appeared.
m

moulik

03/24/2021, 2:38 AM
I mean the ESP application server specs and resource usage. Last seen details for any one of the system would be fine
Michael

Michael

03/24/2021, 7:12 AM
Oh, okay. Server also have 2 virtual cores of Intel Xeon Platinum 8000 and 16 GB RAM. One of agents is on screenshot
k

Kishore Arava

03/24/2021, 12:49 PM
Can you please confirm few things to us to identify why the newly enrolled agents are not showing in ESP UI?1. The server ip in the agent machine's osquery flags file(C:\Program Files\plgx_osquery\osquery.flags) with the flag
--tls_hostname
is same as your control server(ESP)? 2. Certificate(plgx-esp/nginx/certificate.crt) of the server is matching with the certificate you are trying to enroll the agent?
Michael

Michael

03/24/2021, 1:12 PM
Hi, @Kishore Arava!1. It's very embarrasing, but no - there was localhost. Now it works, thanks. There left one more question - memory leaks or smth like that causing RDP fails.
k

Kishore Arava

03/24/2021, 2:05 PM
Yes. Might be. Hope agents are reading the updated config from server now, Based on the last screenshot you shared.
o

OpenPlgx

03/24/2021, 2:08 PM
Is the memory usage coming down in time? I mean is it a spike..i dont think there is a leak here..it could be due to high event load on the system
Michael

Michael

03/24/2021, 2:40 PM
Memory usage going like this in time. And sometimes server going unresponsive. Unfortunately i couldn't look at server in failed state due to failed RDP. Now we exclude all we can to still be secured with.
2:41 PM
Current state of one of agents, if you interested in.
2:41 PM
I'm using Process Explorer to monitor situation
o

OpenPlgx

03/25/2021, 4:03 AM
I can agree that 120 Mb is a a little high (and that also seemed like a spike rather than being consistent) but not terrible for a server work load..what kind of physical RAM you have on these server machines running the agent? I am surprised that a 120 Mb RAM is causing RDP failures
Michael

Michael

03/25/2021, 6:57 AM
Unfortunately i couldn't tell about physical RAM because all our servers in virtual environment. 120 Mb is not maximum. Two days ago i see 486 Mb.
7:15 AM
One of a test servers now
o

OpenPlgx

03/25/2021, 7:43 AM
I understand. I was curious to know if these were transient conditions or were they steady state...for e.g. as I read, few of your servers (endpoints) were not connected with the ESP-server due to an error in the flagsfile. But this wouldnt stop the agent from collecting/caching the data in the meantime and the moment connectivity would resume, it would try to pump all the data back to ESP-Server causing a temporary spike.. is that what we are seeing here (or any such transient condition), is what I am trying to understand.., other reason could be the volume of activity which can be controlled thru tuning event filters specific to your environment, but for us to guide you on that would require you to be able to share some data..
Michael

Michael

03/25/2021, 7:59 AM
What kind of a data do you need?
o

OpenPlgx

03/25/2021, 10:53 AM
Recent Activity data on the system exhibiting the persistent high RAM usage + the plgx_event_filters config
Michael

Michael

03/25/2021, 12:05 PM
There is almost no activity on most RAM consuming machine because of no user activity except of constantly working Process Explorer, which is excluded in agent configuration. Can exclusions count affect on a load?
12:07 PM
Plgx event filter config i could not provide due to security reaseons, sorry. We exclude our antivirus, SIEM, Plgx itself, some Microsoft utilities and frameworks. Excluded ports are standart.
12:07 PM
I constantly monitor events and add exclusions for non-critical mass events.
1:05 PM
Right now
plgx-osqueryd.exe
failed with the next error and can't start. RAM is 98% and except this server is fine.
1:06 PM
May be here is some dump or smth like this? I'm mostly works with Linux and Windows services debugging is new for me 😅
o

OpenPlgx

03/25/2021, 2:05 PM
This error is usually benign (it comes from osquery) but I think what might be happening on this system is for some reason, the benign error is acting up...are you seeing many such errors, as if it were in a loop?
Michael

Michael

03/25/2021, 2:08 PM
Yes, there is a bunch of such errors. Procdump didn't work, by the way 😞
9:07 AM
Hi, guys! Are there any news? We still need some solution
o

OpenPlgx

04/01/2021, 1:10 PM
Without having access to data, it is indeed difficult to suggest options...here is one thing you could try to get rid of this restart loop (which I believe is happening due to some reason osquery being acting up on this system)
1:11 PM
from an admin command prompt, run following commands to stop the services1. sc stop plgx_cpt (wait for 15-30 seconds & run sc query plgx_cpt to make sure it has stopped) 2. sc stop plgx_osqueryd (wait for 15-30 seconds and ensure it has stopped as well) 3. sc stop vast/sc delete vast 4. sc stop vastnw/sc delete vastnw 5. sc start plgx_osqueryd
1:12 PM
do not start plgx_cpt (that's an outside monitoring service causing the trigger)
Michael

Michael

04/01/2021, 3:48 PM
Hi, @OpenPlgx! Thanks, i will check this now
4:00 PM
Okay, i've done this. I shouldn't start plgx_cpt at all? Now we could only wait for failures. Also, i've get this, is it expected?
h

himanshu

04/02/2021, 4:52 AM
this means agent hasn't received server port from Options UI
custom_plgx_ServerPort
. Can you run plgx_osqueryd from command line as below and share any errors/warnings you see on CLI from polylogyx osquery extension? 1. run 'sc stop plgx_osqueryd' 2. in osquery.flags file, rename the osquery db file name to something else, say
--database_path=C:\Program Files\plgx_osquery\osquery_temp.db
3. from CLI, run 'plgx_osqueryd.exe --flagfile osquery.flags --verbose' 4. Note any warnings/errors.
o

OpenPlgx

04/02/2021, 11:16 AM
@Michael, these errors shouldn't cause a functional issue...We can bury them later...but lets see if you still get the high RAM/CPU usage that you mentioned about (or the osquery agent start/stop situation)
Michael

Michael

04/05/2021, 9:05 AM
Hi, guys! Thanks for replies, i very appreciate it. So, on the server where i manually stopped all, removed vast and vastnw, then started plgx_osqueryd service has failed after about 2 days of work. On the server was no activity at all. In current state it still doesn't work. Memory consumed for 97%. It looks like memory leak, but i can't catch anything. All that i've got - without Polylogyx ESP or Polymon all works well
9:06 AM
Ah, yes - we tried to test Polymon and it cause the same problems.
9:17 AM
@himanshu Here it is.
o

OpenPlgx

04/06/2021, 8:54 AM
@Michael, the images you have added seems to suggest the memory outage is in windows defender and/or FireFox
8:55 AM
The agent is taking a very minimum of 7 MB.
8:56 AM
That said, I can understand that the issue is appearing only when you deploy the agent..& I am trying to think how else we can support you given that there isn't a lot of data you can share
Michael

Michael

04/06/2021, 8:57 AM
Hi, OpenPlgx! Yes, agent taking 7 Mb cause it's dead. This is only one case, i haven't seen such error on DC servers, for example, which is Core server without Firefox 🙂 I can clean sensitive data from config and share it, if you need to.
o

OpenPlgx

04/06/2021, 8:58 AM
what do you mean its dead? Its showing in the taskmgr, right?
8:58 AM
Are you saying such errors are not seen on servers that don't have firefox?
8:59 AM
Also a bit confused; if the agent is dead, and then also the RAM usage is going to ~97%??
Michael

Michael

04/06/2021, 8:59 AM
I think it's dead cause it doesn't collect any information - last log events was about two days ago.
o

OpenPlgx

04/06/2021, 8:59 AM
ok, so no queries are active..
Michael

Michael

04/06/2021, 9:00 AM
It's confusing me too. From my point of view it looks like memory leak.
o

OpenPlgx

04/06/2021, 9:01 AM
leak where? if it was leak in the agent, taskmgr would show up it in agent..so its not a leak
Michael

Michael

04/06/2021, 9:02 AM
No, errors about VRAM are not appearing anywhere, this is only case
9:02 AM
So, do you need a config?
o

OpenPlgx

04/06/2021, 9:02 AM
yes, but can I request something simpler before?
9:03 AM
lets start fresh from very basics of osquery (should take about 15-20 mins) if that's ok with you
Michael

Michael

04/06/2021, 9:04 AM
Sure, np. What should i do?
o

OpenPlgx

04/06/2021, 9:04 AM
unistall the agent, completely from the server showing up the issue
9:05 AM
you can do that from an admin command line by running "plgx_cpt -u d";
Michael

Michael

04/06/2021, 9:06 AM
Okay, wait a minute, please
o

OpenPlgx

04/06/2021, 9:06 AM
(plgx_cpt being the tool you must have downloaded to install the agent, from the server)
Michael

Michael

04/06/2021, 9:10 AM
Done. Do you need the log?
o

OpenPlgx

04/06/2021, 9:11 AM
no, looks good
9:12 AM
just give me the output of following commands:1. sc query plgx_osqueryd 2. sc query vast 3. sc query vastnw
Michael

Michael

04/06/2021, 9:13 AM
Done
o

OpenPlgx

04/06/2021, 9:15 AM
oh, you are running from powershell, then perhaps:1. Get-Service plgx_osqueryd 2. Get-Service vast 3. Get-Service vastnw
9:15 AM
expected output "Get-Service : Cannot find any service with service name <service name>"
9:16 AM
after that please install osquery from its website: https://osquery.io/downloads/official/4.7.0
Michael

Michael

04/06/2021, 9:18 AM
done
o

OpenPlgx

04/06/2021, 9:27 AM
can you adjust the osquery.conf to queries of your interest? (you wont be able to run any query for win_* _events table yet.)
9:28 AM
I want to see if the base osquery and its queries work fine on your system first, before we go with PolyLogyx additional stuff
9:28 AM
You might have to stop the osqueryd first, then edit the .conf file and then start it again
Michael

Michael

04/06/2021, 9:30 AM
Okay. It take some time cause we have several packs which contains both standard osquery and plgx stuff,
9:30 AM
I will report as soon as i can, thanks 🙂
o

OpenPlgx

04/06/2021, 9:45 AM
cool
9:45 AM
once we see this working, then we will manually add PolyLogyx Extension as well, then let it run and then add the packs..
9:46 AM
in the mean time, can you share your packs/configs?
Michael

Michael

04/06/2021, 9:50 AM
Let me review them for sensitive data and i will share
o

OpenPlgx

04/06/2021, 10:00 AM
sure
Michael

Michael

04/06/2021, 10:14 AM
1:00 PM
I've add and run all packs we have for osquery
7:32 AM
Hi, OpenPlgx! Looks like osqueryd works well
o

OpenPlgx

04/07/2021, 11:06 AM
Great. So along with the .conf file, can you also share what queries/query packs you were running?
Michael

Michael

04/07/2021, 11:09 AM
Sure. Here it is
o

OpenPlgx

04/07/2021, 11:18 AM
it doesn't have any queries/packs for polylogyx tables?
Michael

Michael

04/07/2021, 11:21 AM
no, it doesn't
o

OpenPlgx

04/07/2021, 11:31 AM
but you get into situation when you enable queries on PolyLogyx tables, right?
Michael

Michael

04/07/2021, 11:34 AM
Looks like. Most consuming process was
plgx_osqueryd.exe
1:07 PM
So what is our next step? Extension?
o

OpenPlgx

04/07/2021, 2:05 PM
that is just vanilla osquery renamed to plgx_osqueryd
2:05 PM
i wanted to see the queries you have for PolyLogyx tables
2:06 PM
and yes, the next step would be to first load the extension (without any queries)
Michael

Michael

04/07/2021, 2:11 PM
Do you mean these?
2:12 PM
How should i apply extension?
o

OpenPlgx

04/07/2021, 2:29 PM
1. Download the latest extension binary by cloning: https://github.com/polylogyx/osq-ext-bin 2. stop the osquery service on your system 3. move the file plgx_win_extension.ext.exe, extensions.load, osquery.flags in c:\program files\osquery 4. Adjust your osquery.conf to apply all the filters (based on the above) 5. from an admin command prompt, run notepad and change osquery.flags file's following line: --database_path=C:\Program Files\osquery\osquery.db to --database_path=C:\Program Files\osquery\osquery1.db 6. restart the osquery service 7. from the admin prompt, check to see vast/vasntw service are running. In case not, stop & start the osquery service again
Michael

Michael

04/07/2021, 2:33 PM
Thanks, i will
2:54 PM
Looks like it works.
o

OpenPlgx

04/07/2021, 5:19 PM
great, lets see if you hit the memory usage issue
5:20 PM
(also, can you make an exception for plgx_win_extension.ext.exe and osqueryd.exe in Windows Defender)?
Michael

Michael

04/07/2021, 6:07 PM
We do not use Defender, and exclusions in our anti-virus are implemented
o

OpenPlgx

04/08/2021, 3:32 AM
Your earlier images/screen shots showed MsMpEng.exe running and consuming the highest RAM in the system ...MsMpEng.exe is the Windows Defender Engine
3:34 AM
3:34 AM
From your earlier post
Michael

Michael

04/08/2021, 7:09 AM
Oh, yes, my bad. Defender is working on the one of the test servers. Now we testing on servers without Defender
7:12 AM
Hi, OpenPlgx! Looks like manually deployed extension works fine. I'm extremely surprised
7:12 AM
What is our next step?
o

OpenPlgx

04/09/2021, 8:59 AM
that is indeed surprising ..there is really no diff in what we did manually vs what would have happened thru the CPT tool
Michael

Michael

04/09/2021, 9:05 AM
I understand, but all what happens have no logic at all. On tests before massive deployment all works perfectly. Then was deployment and problems with RAM and RDP have appeared. Two freaking weeks of collecting information and tries to understand what's going on and now - this. All just works again. I think about vast and vastnw - they are slightly relatives to Sysmon driver, and Sysmon once cause same problems, but long ago. Could it be possible?
12:55 PM
Right now failed one of the Plgx test servers. In this moment there were errors related to SSL and only they. I see them not for the first time, but did not pay attention previously. Could it be somehow related to failure? Plgx processes are alive, just not collecting anything
o

OpenPlgx

04/09/2021, 5:58 PM
vast/vastnw are indeed sysmon-ish drivers....
5:59 PM
PolyLogyx processes are collecting..go to the 'Details' tab
6:00 PM
I dont think the SSL certs are related ..although what is the issue in these 3 images?? the resource usage seem fairly moderate to me
Michael

Michael

04/09/2021, 7:06 PM
No, it's stop collect, i know details in other tab. When i send message, it doesn't works for about 1.5 hours. Process - alive, but no new events
o

OpenPlgx

04/11/2021, 2:00 PM
The events are getting collected in the event log all the time (as long as the process is alive in memory). Certain events might get dropped (depending on the filter conditions) but there is no documented way to stop collecting the events...