Machine-generated data is projected to increase 15x by 2020, fueled by sensors, cameras, and other machine sources of rich data. Copying this data to a cloud for ML training or prediction is impractical from cost, latency, and security perspectives. Putting large amounts of compute directly on the edge nodes is also not always practical. Multiple levels of compute can exist between the data generation point and the final data repository; industry is standardizing on Edge/Fog/Cloudlet models. Typically, compute nodes closest to the data generation point (such as sensors) have the highest recency and most detailed data (richness) but little history due to storage limitations and no perspective (view of other devices). Compute nodes further away from the sensor and closer to the datacenter/cloud have more history and perspective, but the data is older and often less rich.
An ideal ML/DL solution would optimize accuracy, latency and infrastructure efficiency by enabling each layer of compute to focus on the algorithmic work that is best for the recency, richness, history, and perspective of the data available to it, while accommodating for massive scale and intermittent connectivity common at the edge.
We worked closely with ACS, HPE and Intel to put together an Edge/Cloud spanning ML reference architecture which is being presented at MEC 2017 this week in Berlin. A demo of our implementation will showcase scalable Edge to Cloud machine learning for SLA violation detection, demonstrating a coordinated Machine Learning workflow scaling across both inference at the edge and supplemental training in the cloud.
The solution stack for the reference architecture was integrated with the expertise and knowledge of the MEC hardware and software stack, including the LTE Radio setup, provided by ACS. Intel and HPE, respectively, provided the expertise in the hardware for the RAN connected cloud instance and the remote edge MEC compatible hardware providing the user services (Video). ParallelM provided the distributed MLOps solution stack for the ML execution and management.
This reference architecture is jointly being presented in the Intel Booth# 17 (MEC 2017). Please join us at the booth for a live demo! Looking forward to seeing you at the conference, have a great time at MEC 2017.
More details below on the demo/solution:
We used this publicly available dataset. The dataset is a collection of Linux OS kernel statistics of a Video on demand (VoD) server which is providing services to a VLC (VideoLAN Client) media player. The client generates multiple loads representing real world patterns of burstiness and intermittent change. The client is expecting certain service levels and the server’s inability to deliver the desired level is recorded as an SLA violation.
ML Algorithms and Configuration:
Given the sensitivity to latency of SLA violation detections, it is not practical to send all the VoD telemetry information to the RAN connect cloud instance for the predictions. It is also important to maintain the accuracy of predictions as the data traffic changes. To maintain accuracy, the ML algorithms need to be retrained regularly to accommodate shifts in data patterns.
To address both of these issues, the solution includes two ML components. The ML component at the edge does both training and inference to best exploit the richness of edge data and achieve the best possible latency. A second ML component at the cloud complements the first by doing additional training on the more comprehensive data perspective and history available at the cloud.
We use an Online version of the Support Vector Machine (SVM) algorithm for both training and predictions. The publicly available labeled dataset is streamed using Kafka streaming.
Obligatory Architecture Diagram :
The data streamed at the Edge server is also streamed to the cloud server performing the training. The cloud server is expected to have the global view with data coming in from multiple sources, however the edge server instance has the local view of the data. Both Edge and Cloud ML pipelines are managed and orchestrated using the ParallelM MLOps solution.