How does AI/ML enable the multidimensional relationship between edge and cloud in facing operational challenges? This mission is already underway with enhancements to cloud-native software infrastructure to feed data analytics, hardware abstraction & granular observability over zero-trust security, privacy, & end-to-end lifecycle management.
Collaboration between CSPs and hyperscalers is on the rise as they seek to leverage the business potential of edge services on the cloud. Together with Multi-Access Edge Computing (MEC), the edge cloud has the capability to bring computation, storage, network operations, and mission-critical applications closer to end users. This is due to its high-speed access connections, which combined with stricter latency, availability, packet loss, and jitter requirements, can support creative use cases and smarter, faster digital experiences. Yet edge services remain operationally complex, requiring lifecycle management, observability, and near-real-time or real-time scaling operations.
This article explores the overall complexity of service lifecycle management that both CSPs and hyperscalers face when addressing the edge question.
Edge cloud operational challenges
Edge cloud is a logical extension of the cloud that distributes computing capacity (edge computing) and service intelligence closer to end-users to enable new use cases. While traditional cloud networks only had a small number of providers and regulated end-to-end connections, edge clouds employ a distributed architecture and are powered by numerous vendors to manage diverse 5G use cases and enable varied delivery methods for different edge providers. As a result, successful deployment of edge cloud domains requires operators to overcome a series of hurdles, as outlined below.
Cloud-native software infrastructure
The migration from virtualization to cloud-native architectures, which has transformed edge infrastructure into an “as-a-Service” paradigm utilizing cloud orchestrator functionalities like Kubernetes and Nephio, was driven by the transition to a microservice-oriented architecture that supports service deployment models and resulted in the development of the proprietary network function. Both the RAN & Transport domains are evolving too slowly towards cloud-native, which by de-facto complexify the cross-domain service lifecycle management.
Lifecycle management across edge clouds
Edge clouds, an extension of telco/enterprise service infrastructure, can be broken down into “silos” depending on the types of service and applications being provided and supported, as well as device connectivity and point of presence. Meanwhile, edge zones are becoming more widespread, with each one covering a specific geographic area and providing different levels of connectivity and speed. An interoperable system is needed to connect these zones to ensure seamless communication between them.
Using cloud-native technologies, CSPs and corporations can offer private cloud clusters across edge zones. Hybrid cloud capabilities can leverage hyperscale economics by incorporating public cloud services. By intelligently utilizing infrastructure from hybrid cloud partners, hybrid cloud architecture can be intelligently expanded in a cost-effective manner. Such complexity between public and private edge clouds may induce considerable tension. Particularly, instantiating, managing, and monitoring hybrid edge solutions due to their capillary nature and the need to evolve toward multi-cloud scenarios, with AI/ML at the edge.
Depending on the use case, massive volumes of user and device data must be managed and analyzed at consistent intervals. This requires tailoring optimized AI and ML models in accordance with the edge type and management system provided by cloud providers (hyperscalers), which naturally vary from one cloud operator to the next. While transfer learning, with its huge potential for innovation and accelerated learning adaptability from one edge zone to the next, maybe the key, presenting edge solutions from different operators will still need additional development to support different use cases.
Data analytics & telemetry
Telemetry requires the collection of measurements from hardware, network operations, applications and cloud-native or virtualized infrastructure platforms, after which the information is exported to a data lake or data warehouse for analysis and to compute KPIs. The new edge notion feature requires modifications to the traditional telemetry approach of collecting and analyzing data. As the traditional telemetry model couldn’t apply as a central location adds latency in data collection and processing. Therefore, the approach needed to be distributed rather than centralized to reflect specific contexts and ensure data relevance. To achieve this, customized and scalable data and predictive analytics are necessary across various edge development types and use cases.
Hardware abstraction and granular observability
Hardware enhancements (MEC, FPGA accelerator cards, and any other enhancements), which are essential for edge deployments, ensure that the edge meets stringent requirements and increases the availability of COTS hardware. Edge deployments also allow for a variety of PNF-as-a-service models. For GPU-as-a-service, additional intelligence, such as observability, is required to monitor and enable latency-sensitive network functions or applications to completely leverage diverse hardware capabilities and scale across edge deployments.
Security and privacy
The growing need for a secure edge is being accompanied by the emergence of a variety of multifaceted approaches. These include Authentication, Authorization, and Accounting (AAA) across distributed edge zones or providers, any of which would be beneficial to enable private networks such as mobile private networks or network slicing to maintain the privacy of an end user or a device as traffic flows across the edge. Zero-trust security architecture is another approach gaining traction, which assumes that all devices and users are untrusted, but this requires strict identity verification and access controls. MEF (Metro Ethernet Forum) has standardized this approach with its SASE and Zero Trust Framework, offering an additional layer of security for private networks such as mobile private networks or network slicing.
Lifecycle management & network automation
Deploying cloud-native and edge-based software for network functions on commercial-off-the-shelf (COTS) hardware poses numerous challenges, including tasks such as packaging, onboarding, deployment, storage, updating, and testing for interoperability. This is accompanied by the need to ensure high availability with zero downtime, error-free upgrading on-the-go, and on-demand support. To address these challenges, edge deployments can employ practices and tools that include continuous integration/continuous delivery/continuous testing (CI/CD/CT), DevOps and artificial intelligence for IT operations (AIOps), which can help streamline the deployment and operations processes, while ensuring high availability with minimal disruption.
Fig.1: End-to-End Network slicing between CSPs and Hyperscalers
To achieve universal automation and efficient network management, zero-touch and intent-based networks are essential. This involves leveraging AI to take preventive measures against specific objectives, thereby eliminating the need for human interaction. This way, automated operational networks can provide a significant advantage in owning, managing, and running networks at scale, including cloud and edge networks and zones.
The importance of data analytics and closed loops
As duties and applications migrate closer to the end user, it is essential for the service and network to include peripheral components and devices. While different businesses may have the same vertical edge solution, their KPIs may vary, resulting in the need for somewhat tailored KPI management. The edge must be a component of the E2E network to monitor both the E2E and edge domains using the KPI. At the same time, as the number of connected devices increases, an inundation of events occurs, necessitating processing on the edge node in near real-time to provide insights to the devices (e.g. autonomous vehicles). To ascertain the most likely behavior of a service or component, events from the edge are also required.
Fig.2: Multilayered closed-loop automation
With disaggregation and hybrid infrastructure and data, the ability to comprehend the service/network experience increases as network complexity grows. Here, data analytics plays a role by determining if events are related to anomalies while identifying the domain to resolve and the E2E service. Furthermore, to maintain QoS and QoE within the edge domain, certain judgments must be made, such as moving the edge application to a different edge node based on the intent of 10ms latency.
Artificial intelligence for IT operations (AIOps) is now reaching its way into the network and it is for these reasons that AIOps for Networks is generating a great deal of excitement in the industry, particularly among hyperscalers, who already possess the required tools to provide AIOps platforms. Specifically, having a single repository for all data in a “data lake,” it enables them to provision insights that prevent disruptions and automate/closed-loop network and service optimization.
Fig.3: Intent-driven lifecycle: Closed Loop Operations