Mapnik is the mapping tool we use within MapBox to generate custom tiles. In addition to tiles, we are increasingly utilizing Mapnik's excellent python API for rendering dynamic overlays of styled and interactive data. In this situation, being able to effectively communicate complex visualizations demands fast maps, even under high load. Performance quickly becomes just as important as beauty.
I talked about Mapnik's performance with Dane Springmeyer, a lead developer of the Mapnik project (along with Artem Pavlenko) and who we've been working with recently to improve some of the underlying technologies that power MapBox. Here's what he had to say.
Q: So how fast and scalable is Mapnik?
A: Mapnik is used in many large production systems for rendering OpenStreetMap data, so we've always known Mapnik could hold up under extremely high load while efficiently rendering tiles from large datasets. But to get a better grasp of absolute speed of rendering I recently entered Mapnik in the performance benchmarking exercise debuted at FOSS4G 2010 in September in Barcelona. It's called the "WMS Shootout", and the goal is to compare - using the same data on the same physical machine and the same styles - various leading open source and proprietary map renderers.
The results were very encouraging - Mapnik, in comparison to a host of other leading map servers, was one of the fastest, and kept pace with venerable software like MapServer, which has been around for over 15 years. While there were certain tests that Mapnik did quite poorly in (e.g. - shapefile reprojection), in several tests Mapnik appeared as one of the fastest and the most scalable under high load.
Q: Who was involved in the comparison, and what sort of benchmarks were used?
A: The 2010 benchmark included eight teams representing five open source servers and three proprietary ones. It built upon past years of one-on-one competition between just MapServer and GeoServer. The benchmarking tests were run on hardware with 8GB RAM and 8 Cores donated and administered by the Army Corps of Engineers. Datasets of large (multi GB) shapefiles and aerial imagery (GeoTIFF) were provided by the Spanish National Mapping Agency. For each test servers were evaluated based on their throughput (requests per second) with greater and greater concurrent load. So we were able to test how fast a server could respond from one concurrent request up to 64 parallel requests under different scenarios:
- 1: Rendering from Shapefiles with no reprojection
- 2: Rendering from Shapefiles while reprojecting the data on the fly from WGS84 to Google Mercator
- 3: Rendering from aerial imagery in GeoTIFF format with no reprojection
- 4: Rendering from aerial imagery in GeoTIFF format with reprojection
- 5: Rendering from PostGIS with no reprojection
It is important to note that while most applications you see using Mapnik use cached tiles, in the benchmark the WMS (Web Map Service) protocol was used as a standard way of requesting maps from servers to ensure data is being dynamically rendered rather than cached.
Q: How would you recap the overall results?
A: In past years the MapServer and GeoServer results have been virtually identical, but this year real-world datasets from Spain were used and the size and complexity of the data made for some widely ranging performance results between servers. For all the details you should see the final presentation (13MB Open Office doc) crafted by the team of server representatives that participated. My overall recap is that MapServer proved to be the fastest overall server under low to moderate load but did not scale well, while Mapnik was fast under low to moderate load and even faster under high load. With some notable exceptions, other servers generally could not keep up with MapServer and Mapnik in comparison.
Likely the most relevant tests to Mapnik users are the results of using Shapefiles as the datasource and PostGIS (the exact same shapefile imported into the database). To highlight this, I pulled together two graphs showing the results. It should be noted that these graphs smooth the plot lines (for easier viewing of trends), leave off the results of some servers in order to highlight how Mapnik compares to GeoServer and MapServer, and add additional Mapnik runs for Shapefiles. The official charts can be found here.
Here's a look at the results for Shapefiles.
When reading from Shapefiles, MapServer is quite fast, achieving over 50 requests per second throughput under load of 16 parallel connections, but then throughput declines under 64 parallel connections (and what happens under higher load?). GeoServer plateaus at half the throughput. Mapnik is not as fast as MapServer under low load, but its throughput is more consistent and under increasingly high load Mapnik continues scaling linearly. This indicates that Mapnik's core architecture, which supports fast multi-threaded rendering, more than makes up for its slightly slower absolute rendering times as it kicks in under high load.
All lines represent the official results from FOSS4G 2010 except for the red line, which is the result of re-running Mapnik trunk after the recent code sprint under even higher load than the official tests (on the same testing machine). This shows that Mapnik continues to scale linearly beyond load of 256 concurrent clients. This linear scalability is exactly what high performance applications need as sustainability under high load is just as important if not more important than performance under low load. It is also insightful for Mapnik developers, since doing future development to increase performance under low load will be a whole lot easier than having to re-architect the software to be more scalable.
Here are the results showing the performance of PostGIS.
The PostGIS tests represent the exact same data as the shapefile tests but were imported into a PostGIS database on a separate machine. This test was an optional benchmark and not all servers chose to participate, so this chart shows just those servers that provided results. In this test Mapnik excels for both sheer speed at every load level as well as scalability at high load. You may notice that the results of other servers that ran this PosGIS test varied significantly from Mapnik's result. Detailed explanations of issues of disk-boundness vs. cpu-boundness that may explain some of the large variation between servers are available in the final presentation given by the team at FOSS4G 2010.
Q: What are your take homes from the WMS shootout?
A: For our first year of participating in the benchmark, the Mapnik project is thrilled with the results. But while these benchmarks certainly highlight that Mapnik is a contender for being one of the fastest renderers available, most importantly we've learned the places where Mapnik can improve. For example, Mapnik did quite poorly in the 2nd test (Shapefile reprojection), but at a recent Mapnik code sprint we dramatically improved this result.
Overall, we're excited to have Mapnik emerging as a leader of not only of beautiful map tiles, but also of high performance mapping. Over the next year we are looking to invest additional energy in increasing performance (we've got lots of ideas) to ensure that dynamic visualizations will be faster and more robust in the future.