Despite casually using Celery for the last few years, it’s only been the last 6 months where I’ve had to cope with using it at serious scale. This is the point you start looking at best practices to help with the pain of it all. And you might come across the idea of using autoscaling along with different queues.
For example, you could use autoscaling with a default queue and a high priority queue:
(venv) $ > celery -A proj worker -l info -Q default --autoscale 4,2 (venv) $ > celery -A proj worker -l info -Q high_priority --autoscale 8,2
It looks pretty awesome - you can start out with a balanced setup of 2 threads per queue, and if the workload picks up, you can automatically scale up to 8 high priority threads vs 4 default threads!
Yet in practice, you’re more likely to end up with permanently dead workers.
When trying to implement autoscaling on our Celery system, I became very familiar with this error - coming in every hour on the hour:
WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM). File "billiard/pool.py", line 1223, in mark_as_worker_lost human_status(exitcode)), Task handler raised error: u"WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)"
Autoscaling problems remain an open issue on the Celery project, and there’s ambiguity about whether autoscaling will be removed entirely or fixed. I’m betting on removal. But either way, the consensus is that autoscaling does not work as it stands now, and should be avoided (even if they keep forgetting to mention it in the docs).
Prefer to catch my posts elsewhere?