One of the most heard of buzzwords nowadays is IoT (Internet of things). This in general describes any device connected to the internet wether it does sensing, control alerting or other functionality over the internet. All weather companies have weather stations, and you might of heard the terms automatic weather station. Automatic weather stations is just an old word for connected weather station, currently all weather stations can send their data over the internet through GPRS, WiFi or Satellite internet connectivity.
This sounds simple, until you notice that your weather stations network is growing, you are connecting to 3rd party weather stations and suddenly your weather stations data is becoming very large and hard to manage and you find yourself dealing with a "Big Data" problem. Some weather stations may update information at 3 second intervals over HTTP so imagine now you have hundreds or thousands of weather stations.
At ArabiaWeather the architecture behind building a proper weather stations network was a hot topic. We did not want to build a ridged system that only reports weather, we did not want to sacrifice high resolution data and we definitely want a scalable server solution. After research we have put together a very versatile scalable and adaptable solution not only to sense and store data but to also help alert and control other objects according to the sensed data.
The architecture we ended up using after a long time of research was to utilize NodeJS for a scalable HTTP server that will be the front line that will receive the data from the weather stations, load balanced by utilizing nginx making it infinitely scalable at this point. The HTTP data now is aggregated and thrown as is into what we like to call data pipes based on MQTT. We have tried and looked at many MQTT brokers and the one we liked the most with very strong potential of scaling was VerneMQ.
The MQTT pipes now always pass the weather stations data into a MQTT topic for each weather station. Now to utilize this data you need a consumer for the data that connects to the MQTT pipes, so for example we have a realtime SMS, Push notification and email alerting system that is listening to all the data that is going through the pipe, evaluating the data with rules and if the rules apply a trigger is sent out.
This sounds good so far but where does the data end up ? The so called pipes are volatile and the data does not persist there. Well, we use a Time Series database that has been gaining traction fast, influxDB. InfluxDB is scalable and cluster-able to multiple nodes and very fast at consuming and retrieving data (as long as you dont go around messing up asking for years of data in seconds resolution).
Now since you have an MQTT pipe with all the data there, go ahead consume it as you like and store it where you prefer. We for example are storing some of the data in Solr and MongoDB alongside with influx and by only extending a small consumer script that listens to the data and puts it into the DB of choice.
Keep in mind sometimes you DB will not be able to handle what your MQTT pipe does, so you will need a simple Queue method we prefer Kafka at the moment.
Some useful links mentioned in the post: