Blog: Daemon
Drupal, Meet Python
Speeding Up Aggregation in Drupal with a Daemon
Speeding Up Aggregation in Drupal with a Daemon
Recently we’ve been working to get our team aggregator and media analyzer Managing News running faster – we want it to fly. To do this we’ve had to really push what Drupal and the LAMP stack can do. Aron and Alex have done great things with aggregation and feed parsing to extend the volume of content that can be collected, which is essential for Managing News since is can aggregate tens of thousands of articles every day. But we still wanted to be able to aggregate content faster.
A major thing that slows down adding content to Managing News is its semantic analysis – every piece of content that the system pulls in is automatically browsed and given tags that describe it. To do this we mostly use third party web services like Yahoo's term extraction API. Waiting for Yahoo to process the text of each article coming in can add a significant amount to time to the process of adding content - it keeps cron running longer, which increases the chances of a bad cron run, and ties up the system’s resources to just wait.
To get around this we looked for ways to move the semantic analysis elsewhere. We settled on the idea of using a small external program to talk to tagging web services and do it's own analysis of content in our Drupal system. We looked at a couple options, and quickly decided to write a Python daemon. (A daemon is a program that runs in the background on a server. It waits for certain events and then takes specified actions. In our case the daemon waits for new content, and then tags and processes it as it becomes available.)
