py-spy in Azure Batch
Today, I was debugging a hanging task in Azure Batch. This short post records how I used py-spy to investigate the problem. Background Azure Batch is a compute service that we use to run container workloads. In this case, we start up a container that processes a bunch of GOES-GLM data to create STAC items for the Planetary Computer . The workflow is essentially a big for url in urls: local_file = download_url(url) stac.create_item(local_file) We noticed that some Azure Batch tasks were hanging. Based on our logs, we knew it was somewhere in that for loop, but couldn’t determine exactly where things were hanging. The goes-glm stactools package we used does read a NetCDF file, and my experience with Dask biased me towards thinking the netcdf library (or the HDF5 reader it uses) was hanging. But I wanted to confirm that before trying to implement a fix. ...