Hi gang! We're running into an issue over here wit...
# fleet
b
Hi gang! We're running into an issue over here with what seems like a bug in the MacOS Setup experience. When users onboard their new device, there are a number of software pieces that get installed through Fleet (Chrome, Slack, and some default apps) in addition to a bootstrap sh script. There is about a 10% chance that when the script finishes, the Fleet progress window will never show that the script finishes, despite us seeing the script finish from the portal side. Any ideas what might be happening or what to look for in the ECS logs?
u
Hey @Billy H ! What happens on the device when that script gets “stuck”? Is it continuing through the process, or hanging?
b
hello! It just hangs, the screen just shows that it is still running the script forever
hey gang! this is a bit of a more urgent request now, it seems to be affecting more devices than we thought
u
So sorry @Billy H ! Is it always the same script getting "stuck"?
b
that is correct. its just the bootstrap script that does this
u
Can you share that script?
b
I can give you a heavily sanitized version! The script maybe takes about 7-10 minutes to run and our script timeout period
script_execution_timeout
is 3600 seconds
Copy code
#!/bin/sh
# This is the bootstrap script to get the machine to a desired state

#######################################################################

# Install EDR Endpoint Protection
# Download EDR installer
curl -o /tmp/EDR_installer.zip <https://sanitizedurlgoeshere/EDRInstall.zip>

# Unzip the installer silently
unzip /tmp/EDR_installer.zip -d /tmp/EDR_installer > /dev/null

# Run the installer
EDR_DIR="/tmp/EDR_installer"
chmod a+x $EDR_DIR/EDR\ <http://Installer.app/Contents/MacOS/EDR\|Installer.app/Contents/MacOS/EDR\> Installer
chmod a+x $EDR_DIR/EDR\ <http://Installer.app/Contents/MacOS/tools/com.EDR.bootstrap.helper|Installer.app/Contents/MacOS/tools/com.EDR.bootstrap.helper>
$EDR_DIR/EDR\ <http://Installer.app/Contents/MacOS/EDR\|Installer.app/Contents/MacOS/EDR\> Installer --quiet

# Cleanup
rm -rf /tmp/EDR_installer.zip /tmp/EDR_installer

echo "EDR installation complete!"

#######################################################################

# Install Password Manager
echo "Installing Password Manager..."
sudo /usr/local/bin/catalog -i com.password.manager -s

# Install ticket software
echo "Installing ticket software is not installed..."
sudo /usr/local/bin/catalog -i com.ticket.software -s

#######################################################################

# if arm64, install Rosetta for compatibility
arch_value=$(arch)
if [ "$arch_value" == "arm64" ]; then
    echo "ARM64 architecture detected. Installing Rosetta..."
    sudo /usr/sbin/softwareupdate --install-rosetta --agree-to-license
fi

# install cli tool from github
URL="<https://api.github.com/repos/sanitized/app/releases?q=cli/latest>"
curl -s $URL | awk -F\" '/browser_download_url.*app-cli-macos-.*\.pkg/ {print $(NF-1)}' | sort -V | tail -n 1 | xargs -I {} curl -o /tmp/app.pkg -L {}
sudo installer -pkg /tmp/app.pkg -target /
rm /tmp/app.pkg

echo "Exiting."
exit 0
u
would it be possible to split that into multiple scripts? Or try one or two enrollments without that script? I’d like to try to break it down a little to verify whether the problem is actually related to this specific script.
b
I'll do my best and report back. anything you can think of I should be looking for in the ECS logs for failures?
u
We can look for errors in general in the Fleet logs.
👍🏻 1
Anything interesting show up in the logs?
b
Nothing yet, I have some time tomorrow to stress test it so I'll keep you posted
Hi @Kathy Satterlee, with the reduced bootstrap script changes, the script finished successfully in 4 minutes. I can see that from the Fleet portal but is still sitting there spinning on the workstation after about 12 minutes now and counting.
ECS logs from when the script finished on that host (latest log on top, oldest on bottom):
Copy code
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"1.077584ms","ts":"2025-08-06T14:18:02.456617876Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"2.369552ms","ts":"2025-08-06T14:17:52.241070929Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"1.004502ms","ts":"2025-08-06T14:17:41.988139413Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"2.660085ms","ts":"2025-08-06T14:17:39.405575153Z","uri":"/api/fleet/orbit/scripts/request","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"11.440506ms","ts":"2025-08-06T14:17:39.382938757Z","uri":"/api/fleet/orbit/config","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"964.973µs","ts":"2025-08-06T14:17:31.864397159Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"1.304436ms","ts":"2025-08-06T14:17:21.76735293Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"1.016221ms","ts":"2025-08-06T14:17:11.616518707Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"6.046733ms","ts":"2025-08-06T14:17:10.919171347Z","uri":"/api/v1/osquery/config","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"25.795536ms","ts":"2025-08-06T14:17:09.733369469Z","uri":"/api/fleet/orbit/scripts/result","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"execution_id":"56ed5f41-903e-40e4-a037-fbc0410844db","host_uuid":"13E7E207-55A2-59D6-A03D-FF16B68CD09A","level":"debug","msg":"setup experience script result updated","ts":"2025-08-06T14:17:09.722530519Z"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"159.73569ms","ts":"2025-08-06T14:17:09.563809157Z","uri":"/api/fleet/orbit/setup_experience/status","x_for_ip_addr":"REDACTED"}
August 06, 2025 at 10:17 {"component":"http","host_id":42,"ip_addr":"REDACTED","level":"debug","method":"POST","took":"11.543702ms","ts":"2025-08-06T14:17:09.384008707Z","uri":"/api/fleet/orbit/config","x_for_ip_addr":"REDACTED"}
there was only one error that occurred during that period but it's unclear if it was related to this specific device enrollment
Copy code
{
  "err": "error posting to <https://REDACTED.execute-api.us-east-1.amazonaws.com/automations>: 400. \"Unsupported event type or missing data.\"",
  "level": "error",
  "msg": "fire activity webhook to <https://REDACTED.execute-api.us-east-1.amazonaws.com/automations>",
  "ts": "2025-08-06T14:17:24.107409022Z"
}
u
That looks like an error sending a webhook, which shouldn't be related. Good news that the script is completing in the UI, we're moving forward there. Just to check it off the list, can you confirm that you don't have "Release device manually" selected in Controls > Setup assistant > Advanced?
b
So the script was always finishing from the backend, even before I made the changes to reduce the script length. So I wouldn't necessarily say that is progress, unfortunately. I can confirm that
Release device manually
is disabled.
u
Would it be possible to grab the fleetd logs for a host that experienced the issue, but then got past it?
b
I'll see what I can do! I can usually get into the machines after a reboot, so hopefully when I catch it again the log persists through that shutdown
Is there any way to get the devices fleetd log through the portal? (if not I think that might be a useful feature request based on how often I see you guys reference it)
Interesting result from this latest run. Same behavior, I can see from the backend that the software is installing and the bootstrap script finishes, but it stayed at this white screen the whole time. Can't get into the machine though, so no fleetd log from this one. Will keep trying
I had a run where the white screen showed up but the script did finish and the machine went through normally, so here is that log. the
orbit.stdout.log
file was empty. Still working on getting the log when the setup experience hangs
ok I finally got a log where the machine hung! looks like the machine is receiving the script results on line 191 but is still hanging there. that timestamp on line 191 matches with when the Fleet portal says the bootstrap.sh script finished. Manually sending the macos device release command did not unhang it. I had to force shut it down, write a script to create an account, run that on the machine, and then sign in with that account to get in and get the logs
the other lines with this content:
received notification to run scripts
Those are all automated scripts that are getting run from failing policies that the machines have when they are new
fleet seems to receive the result that the bootstrap script finished but then doesn't continue to check the progress of the setup experience after it receives the result successful:
Copy code
2025-08-08T12:08:02-07:00 INF checking setup experience status
2025-08-08T12:08:03-07:00 INF swiftDialog started
2025-08-08T12:08:03-07:00 INF setup experience: checking for pending statuses
2025-08-08T12:08:03-07:00 INF setup experience: rendering software and script UI
2025-08-08T12:08:03-07:00 INF setup experience: no change in status for bootstrap.sh
2025-08-08T12:08:26-07:00 INF saving script result da247c98-8ac1-4451-8491-7981e5095f02 with exit code 0
2025-08-08T12:08:26-07:00 INF running scripts [da247c98-8ac1-4451-8491-7981e5095f02] succeeded
2025-08-08T12:08:32-07:00 INF checking setup experience status
2025-08-08T12:08:33-07:00 INF swiftDialog started
2025-08-08T12:08:33-07:00 INF setup experience: checking for pending statuses
2025-08-08T12:08:33-07:00 INF setup experience: rendering software and script UI
unsuccessful:
Copy code
2025-08-08T13:07:59-07:00 INF checking setup experience status
2025-08-08T13:07:59-07:00 INF received notification to run scripts [d50c5993-f7fd-4337-bb19-d096ba4bdd14]
2025-08-08T13:07:59-07:00 INF swiftDialog started
2025-08-08T13:07:59-07:00 INF setup experience: checking for pending statuses
2025-08-08T13:07:59-07:00 INF setup experience: rendering software and script UI
2025-08-08T13:07:59-07:00 INF setup experience: no change in status for bootstrap.sh
2025-08-08T13:08:05-07:00 INF saving script result d50c5993-f7fd-4337-bb19-d096ba4bdd14 with exit code 0
2025-08-08T13:08:05-07:00 INF running scripts [d50c5993-f7fd-4337-bb19-d096ba4bdd14] succeeded
@Kathy Satterlee hi Kathy, any ideas what we should be looking into here?
u
Apologies, @Billy H ! Can you confirm what version of Fleet you're running?
b
@Kathy Satterlee all good! we are on 4.70.1
Hi @Kathy Satterlee has there been any updates on this? This is still a high priority issue for us
u
@Billy H. Would it be possible to get you updated to the latest version of Fleet? there were some bugs with the MDM flow that have been resolved recently.
b
kk I can do that, will keep you posted and then retest
@Kathy Satterlee hi Kathy! Bad news, the issue is still occurring on hosts. orbit logs basically look the same as what I sent before, no errors or anything
u
Can you run fleetctl debug archive for me and then share the results? You'll need to zip to share on Slack, and are welcome to DM that to me!
b
sounds good!
@Kathy Satterlee just sent the archive in a DM to you
u
Thanks @Billy H! Digging into those now.
👍🏻 1