Building Meraki’s cloud controller architecture

Meraki is an interesting company. They took a very different approach to the Wi-Fi market. Instead of concentrating on speeds and feeds, like many other companies, their major innovation and idea was to build a cloud controller for their products and make the refinement of that interface the focus of their efforts. This has resulted in a highly customer focused company. I admit I’ve taken a while to warm up to their offering, but I do see its value for customers and I admire what they have built. Meraki has participated in three Wireless Field Days. They chose this time to give us an in depth look at how they designed their distributed cloud based infrastructure. It was fascinating on several levels for me. First, it was really a ballsy move to open up and show the world what they had built. Next it answered directly some lingering criticisms that I have heard about their company using customer data in nefarious ways. It also showed exactly what the value proposition was that made Cisco spend $1.2 billion on this company and indeed from the brilliance displayed here I can see it was worth that price. Last, but not least, the technical details that went into building their platform and how they solved the various problems they encountered are simply a spell binding story.

As I don’t believe I could do a better job at telling the story than the presenter, Sean Rhea I will instead give you the video below of his presentation and then end with my thoughts on it. He managed to hold an entire room of wireless geeks spellbound and even mentioned storage to the delight of Stephen Foskett, who is the driving force behind the Tech Field Days.

Meraki Cloud Architecture Deep Dive with Sean Rhea

Here is a few observations that I have taken from the video;

  • Customers are allocated to a partition called a shard, which is on a physical server in their datacenter and is also replicated to another site for redundancy.
  • When you login, its to a master shard which then redirects you to the shard that hosts your data.
  • They have the ability to keep functioning even if they lose completely a datacenter due to the way the tunnel is designed from the Meraki APs and devices.
  • Each shard has thousands of Meraki devices, hundreds of thousands of clients per day and gathers 300GB of statistics going back over a year. The shards get new data every 45 seconds from their devices or every second if you are logged in and monitoring a device.

That is a huge amount of data and a massive amount of servers to monitor and look after. The way Sean described it though, they have put a lot of thought into making the system as efficient and redundant as possible.

  • In order to access the Meraki devices behind NAT and corporate firewalls they developed an IPSEC like tunnel mechanism called mtunnel. It uses AES encryption and certificates that are shipped with every device.
  • Mtunnel has redundancy for the connection back to the datacenter and if a shard isn’t accessible can reroute to the backup shard.

This says to me there is a massive amount of cryptographic processing going on here with each device tunnelling back. I’ve worked large VPN projects before and in order to get this right you have to put a lot of thought into making sure your components can handle that.

  • Meraki developed their own mechanism for getting all the useful stats from their devices and shipping it back for use by the dashboard called Poder.
  • The RPC engine used by it has reduced the data being sent by 80-90%, uses UDP and can talk to 10,000 devices in 20 seconds.
  • It uses a modular approach and is written in a Java like language called Scala. Each module is small (200-400 lines) and single threaded).

That achieves two very important things for such a massive undertaking, speed and low bandwidth use. The other problem that cropped up is what to do once you’ve got the data.

  • Standard OTLP databases are not good at hot clustering and unless their data set fits into RAM can take many seeks to grab the data off disk. They also often load too much of the wrong type of data, noise that you are not interested in, into RAM, which wastes RAM on that other data.
  • Meraki wrote a custom database called Little Table to solve these problems. It uses in memory ordered trees which are sorted by network_ID and MAC address. It also has a method to retrieve the data needed from disk with a single seek if needed.
  • It is a very high performance database compared to commercial one’s with 8MB/s inserts and 40MB/s query’s.

The last part of the video, which is a bit muffled as Sanjit Biswas was away from the mic, we talked about the pains that Meraki takes to maintain customer privacy. They do regular audits and have SAS70 certification. This was comforting to hear.

I would say that Meraki has set the bar pretty high on how to make a customer driven management and monitoring solution for the enterprise. Their feedback feature where customers can ask for a change or new feature to their platform and the speed at which they add those requested features into their interface shows they are constantly listening to their customer’s needs. This is actually quite a challenge for Cisco, who is pretty slow to respond to customer complaints and tends often to beta test new code with their customers, or so it seems. It will be interesting to see if Meraki is able to retain their agility as they are absorbed into Cisco’s ways of doing things.

I know this post has been fairly short on wireless specifics, but I have always been of the mind that there is much more to the WLAN than simply how a wireless AP is designed or what a controller can do. The interesting bits I see are how the wireless devices fit into an overall solution that solves a customer’s needs. There is a balancing act in solving those customer problems while at the same time making sure you cover things like security, performance and just basic expectations like making sure it works consistently for people. I’m sure most people out there maintaining a WLAN want their system to be able to keep their users in a happy place of being connected while pleasing their bosses needs for protecting their business from risks. Making a system that solves those problems in a way that makes an administrator’s job easier is no mean feat, so kudos’ for Meraki for concentrating on a cloud solution that focused on an easy to use, customer driven interface.

One Response to “Building Meraki’s cloud controller architecture”

  1. Hello!

    In terms of Meraki retaining their agility under the umbrella of Cisco:
    One way of making sure the firmware is utilised by flashing your Meraki APs with OpenWRT

    You can actually Free your Meraki paperweight from licence fees here

    Its free sign up & You can flash MR12, MR16 and MR18

Leave a Response