How do you join tables on a primary key? I'm not s...
# fleet
a
How do you join tables on a primary key? I'm not seeing any primary index keys in the tables. For example, https://fleetdm.com/tables/plist I could join with a value, but that's less precise and reliable (and generally useful) than joining on a shared key (across tables) like the hostname or serial number.
Copy code
SELECT pa.value deadline, pb.value os_version
FROM plist pa
INNER JOIN plist pb
    ON split(pa.subkey,'\/',2) = split(pb.subkey,'\/',2)
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pa.subkey LIKE '%TargetLocalDateTime%'
    AND pb.subkey LIKE '%TargetOSVersion%'
This example is a table self-join (where the indexed primary key would be the best to join on) but even for regular INNER JOINs or LEFT JOINs on other tables, a key to join on would be helpful.
@Patricia Ambrus @Will Mayhone for vis
f
could you provide a copy of that actual plist?
a
Its prefs would be something like this:
Copy code
{
    SUCorePersistedStateContentsType = SoftwareUpdateCorePersistedStateFile;
    SUCorePersistedStateCoreFields =     {
    };
    SUCorePersistedStateCoreSecureCodedObjectsFields =     {
    };
    SUCorePersistedStateCoreVersion = "2.1.0";
    SUCorePersistedStatePolicyFields =     {
        Declarations =         {
            "com.apple.RemoteManagement.SoftwareUpdateExtension/ABCDEFG123-1234-ABCD-5678-A1B2C3D4E5:abcdeFghiJKLMNOpqrstUVwxyz1234567890.abcdeFghiJKLMNOpqrstUVwxyz1234567890=" =             {
                RMStoreDeclarationKey = "com.apple.RemoteManagement.SoftwareUpdateExtension/ABCDEFG123-1234-ABCD-5678-A1B2C3D4E5:abcdeFghiJKLMNOpqrstUVwxyz1234567890.abcdeFghiJKLMNOpqrstUVwxyz1234567890=";
                TargetLocalDateTime = "2023-11-14T00:00:00";
                TargetOSVersion = "14.1.1";
            };
        };
    };
    SUCorePersistedStatePolicySecureCodedObjectsFields =     {
    };
    SUCorePersistedStatePolicyVersion = "1.0";
}
More documentation here: https://www.sudoade.com/ddm-software-update-logs/
But this isn't really about this specific example. It's just an example. In general, it would be difficult to do table joins without a key to join tables on.
k
Hi @aysiu! It sounds like your concern is making sure that the data returned is all associated with the same host. Since the queries are run on the host itself, using only the data available for that host, there’s no need to provide that unique key.
If I’m off on my interpretation of the question, just let me know!
f
i can't speak to your specific implementation but in house we've used fleed for years and rely on fleet decorator queries to ensure that every event has the same fields (host_hardware_serial, hostname, timestamp) etc. maybe this is what you are after?
a
Since the queries are run on the host itself, using only the data available for that host, there’s no need to provide that unique key.
That's fine if you query a single table, but when you're joining tables, you have to join on a common key. Do you have examples on how you recommend joining tables? How would you have done the self-join I posted above without using that
split()
workaround?
i can't speak to your specific implementation but in house we've used fleed for years and rely on fleet decorator queries to ensure that every event has the same fields
I'm not familiar with decorator queries. Can you link some documentation that helped you?
f
agent configuration options in fleet UI https://fleetdm.com/docs/configuration/agent-configuration
k
Assuming that the plist file would only contain one of those declarations, I'd skip the
ON
entirely.
Copy code
SELECT pa.value deadline, pb.value os_version
FROM plist pa, plist pb
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
AND pa.subkey LIKE '%TargetLocalDateTime%'
AND pb.subkey LIKE '%TargetOSVersion%'
If there could be multiple declarations, I may do something similar to what you are doing.
Or treat it as a single select that returns two rows, though that might not be ideal depending on how you need to process the data:
Copy code
SELECT value , subkey
FROM plist
WHERE path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
AND subkey LIKE '%TargetLocalDateTime%'
OR subkey LIKE '%TargetOSVersion%'
> joining on a shared key (across tables) like the hostname or serial number. I think that the key piece of information here is that a shared key like hostname or serial number isn't really needed because it would always match across every row in every table because the osquery db exists locally on each host.
In scheduled query results, the host can be identified using the host identifier and any additional decorators you add.
If you're running a live query, Fleet associates the incoming results with the host using the host identifier.
But none of the logic of the query is happening
after
the results have been sent. the query, in its entirety, is running locally on each host that you're querying.
a
I think that the key piece of information here is that a shared key like hostname or serial number isn't really needed because it would always match across every row in every table because the osquery db exists locally on each host.
I disagree here. Let's say I want to look at the values from that query (which is on the plist table), but I also want to get corresponding values (same hosts) from the os_version table. Can you give an example of how you do that without joining on the hostname or serial number?
f
i feel like you are trying to force something, if you just want results from multiple tables on the same host with a single query maybe a UNION of multiple queries? but the columns won't match up so you'd probably have to JOIN anyway on some column anyway to make it all work. I don't know SQL expertly but this seems like an anti-pattern to me.
as mentioned above, depending on your configuration and agent flags, there should always be at least one key that is guaranteed so you can join.
I think this achieves what you are saying but in this case it is essentially adding os_version columns to every plist read result as a contrived example
Copy code
WITH query1 AS (SELECT * FROM  plist WHERE path = '/Users/blah/Library/LaunchAgents/org.nginx.nginx.plist'), query2 AS (SELECT * FROM os_version) SELECT query1.*,query2.* FROM query1, query2;
Screenshot 2024-05-06 at 6.46.49 PM.png
a
> i feel like you are trying to force something, if you just want results from multiple tables on the same host with a single query maybe a UNION of multiple queries? If I run a query like
Copy code
SELECT pa.value deadline, pb.value enforced_os_version, NULL current_version
FROM plist pa
INNER JOIN plist pb
    ON split(pa.subkey,'\/',2) = split(pb.subkey,'\/',2)
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pa.subkey LIKE '%TargetLocalDateTime%'
    AND pb.subkey LIKE '%TargetOSVersion%'
UNION
SELECT NULL deadline, NULL enforced_os_version, version current_version
FROM os_version
I just get results from the first query and then results from the second query (intermixed, of course, but all on separate lines). That's how a UNION works. I don't want a UNION. I want a join. I want to be able to have the current OS version next to the enforced OS version and the deadline for a single machine. I don't think I'm asking for something wild here. Being able to join tables is a fairly standard SQL/MySQL query practice.
f
right you can do that sure. the issue is trying to do these joins in the context of osquery/fleet is not possible because the tables do not have those shared keys, they are an abstraction that gets added to the "event" that is emitted as a result. so these joins are totally easy to do off platform. we use a data lake, we have decorators that guarantee additional fields in every single event, plus the configurable host_identifier value. so we can join all day because every single event/result in our datalake has hostname, hardware_serial, uuid etc. doing this in the context of osquery is not possible afaik
k
Give this one a try:
Copy code
SELECT pa.value deadline, pb.value enforced_os_version, ov.current_version
FROM plist pa
INNER JOIN plist pb
    ON split(pa.subkey,'\/',2) = split(pb.subkey,'\/',2)
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pa.subkey LIKE '%TargetLocalDateTime%'
    AND pb.subkey LIKE '%TargetOSVersion%'
JOIN os_version ov
I had a slight error:
Copy code
SELECT pa.value deadline, pb.value enforced_os_version, ov.version current_version
FROM plist pa
INNER JOIN plist pb
    ON split(pa.subkey,'\/',2) = split(pb.subkey,'\/',2)
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
    AND pa.subkey LIKE '%TargetLocalDateTime%'
    AND pb.subkey LIKE '%TargetOSVersion%'
JOIN os_version ov
I'd also be curious to see what you get for this:
Copy code
SELECT pa.value deadline, pb.value os_version, ov.version current_version
FROM plist pa, plist pb, os_version ov
WHERE pa.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
AND pb.path = '/var/db/softwareupdate/SoftwareUpdateDDMStatePersistence.plist'
AND pa.subkey LIKE '%TargetLocalDateTime%'
AND pb.subkey LIKE '%TargetOSVersion%'
f
your last example is how i did it as well
Copy code
+------------+------------+-----------------+
| deadline   | os_version | current_version |
+------------+------------+-----------------+
| 2024/05/07 | 20.20.20   | 14.4.1          |
+------------+------------+-----------------+

$ cat /tmp/example.plist
<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
        <key>TargetOSVersion</key>
        <string>20.20.20</string>

        <key>TargetLocalDateTime</key>
        <string>2024/05/07</string>
</dict>
</plist>
Copy code
WITH policy AS (SELECT pa.value deadline, pb.value os_version
FROM plist pa, plist pb
WHERE pa.path = '/tmp/example.plist'
AND pb.path = '/tmp/example.plist'
AND pa.key LIKE '%TargetLocalDateTime%'
AND pb.key LIKE '%TargetOSVersion%'
)
SELECT deadline, os_version, version current_version FROM policy, os_version;
only reason I prefer CTE pattern is to easily reference the new variable names
Copy code
WITH policy AS (SELECT pa.value deadline, pb.value os_version
FROM plist pa, plist pb
WHERE pa.path = '/tmp/example.plist'
AND pb.path = '/tmp/example.plist'
AND pa.key LIKE '%TargetLocalDateTime%'
AND pb.key LIKE '%TargetOSVersion%'
)
SELECT deadline, os_version, version current_version
FROM policy, os_version
WHERE os_version <> current_version;
k
NIce. That's really readable.
@aysiu The whole concept of osquery's abstraction is a lot to take in. Rather than selecting data from a database, you're just describing what data you want osquery to retrieve from the host for you. SQLite is the language used to describe requests because it's something that's widely known and pretty approachable, but you aren't really working with tables under the hood. Each table just represents a type of data that osquery can retrieve.