Automatic Path Creation Service - APC

This document explains the architecture of the APC (Automatic Path Creation Service), a cluster-based infrastructure service. APC is a component within ICEBERG to establish data flows between heterogeneous communication endpoints by composing appropriate transoding operators. Call agents of the ICEBERG artchitecture make requests to the APC service for data path creation by providing the required endpoint information. APC service provides a clean separation of the data paths from the control paths by encapsulating the data path creation, instantiation, and maintenance process.


Table of Contents


Architecture of APC:

Terminology:

This section describes the terminology used in the following sections.

Operators

An operator is a unit of computation on some data. Operators are strongly-typed: they have a clear definition of the input and output types. In addition, operators have various attributes such as communication protocol, computational requirement, static data input (e.g. a database) etc. Only based on these attributes, is automatic composition of services possible.
Operators can have multiple inputs and outputs (i.e. aggregators, broadcasters).
Operators have only soft-state: this means that if an operator fails and is restarted, there is no need to recover its state information. During a path execution a failed operator can just be restarted without any explicit error recovery mechanism. Application level protocol is needed to provide guarantees such as reliability, ordered, end-to-end data delivery.

Connectors

A connector is an abstraction of the Application Data Unit (ADU) transport mechanism between two operators (e.g, RTP connector, UDP connector, etc.) Each connector is characterized by a specific transport protocol.

The concepts of operator and connector are not completely distinct from each other. Certain operators can use only a limited set of connectors. Some have built-in connectors. For example, a MPEG decoder assumes the input is a file and uses standard I/O. To make such an operator transmit and receive data using sockets, we use run-time linking techniques to replace standard I/O with socket I/O.

Overview:

APC service is responsible for creating, maintaining, and eventually freeing up data paths consisting of operators strung together by connectors. The overall path construction process can be illustrated through the following diagram:

It is a iterative process of continuous optimization.

Implementation:

This section describes some of the interesting implementation details and functionalities provided by APC in the current release.

Simplifications of the design:

For the purpose of prototyping, we made a number of simplifications to the above ideal design:

APC service:

The APC service itself is a Ninja vSpace cluster service. The idea is that multiple nodes will implement the APC service, providing fault-tolerance and load-balancing features. This will require the sharing of data among the nodes using distributed data structures (DDS). In this release, we only consider a single-node APC service.

This release uses Ninja vSpace, an event-driven cluster service programming platform. vSpace workers or service instances are asynchronous event handlers that accept tasks (i.e. units of work to be processed) as well as completions (i.e. return values). Interactions with the distributed data structures and other modules is achieved through non-blocking, split-phase RPC calls, taking an upcall as a parameter and returning immediately. At a later time, when the actual result is received, the upcall handler is invoked. We perceive a great improvement in scalability as a result, due to reduced overhead in context-switching and thread state.

The APC service is structured in the manner illustrated in the figure above. It consists of four types of workers -- Logical Path Creation Worker (LPC), Physical Path Creation Worker (PPC), Physical Path Execution Worker (PPE), and Connection Manager Worker (CMGR). LPC workers create logical paths provided with a path request (arrow 0) containing the two end point descriptions of data format, IP address, port number, input arguments to the service, QoS metric, any required operators in the data path. Then they dispatch a task (arrow 1) to the PPC workers which determine the physical locations to run individual operators. This task consists of the original path request as well as the logical path description of the sequence of operators joined by connectors. Note, the path may not be linear.

Subsequently PPE workers receive a task (arrow 2) to instantiate paths from PPC workers. This task contains the physical path description (i.e. the operator name as well as their running location). PPE then contacts CMGR worker (arrow 3) which runs on each physical node of the path to create operator instance (if operators are to be created on the fly) and the connector between operators. This task (arrow 3) contains the name of the operator as well as its input arguments and the path identifier. There are two stages in path execution: First, a CMGR worker instantiates the operators and creates the necessary connectors. Once the entire path has been instantiated, a CMGR worker starts the operator, which begins computation and starts to receive from and send data through its connectors.

The path requestor will finally receive a completion event indicating the success or failure of the path request after a successive sequence completion events flowing from CMGR workers back to the client (arrows 4-7). If any worker fails, the underlying vSpace cluster will take care of resubmission of the task to another worker. After the retry has exceeded a specified number, a failed completion event is propagated back to the client indicating the failure and its reason. This decomposition of tasks within the APC Service maximizes the concurrency and minimizes blocking.

Each path has associated with it a unique identifier. Users of paths can send requests to APC to make changes to paths (e.g., change data output location). If any operator or connector fails, PPE will initiate repair by noticing the failure itself, or receiving a message from the corresponding CMGR worker, or receiving a repair request from CMGR workers running on neighboring operators of the failed operator. This is due to the redundant control paths mechanism.

Connection Manger:

This is also a vSpace service that must be implemented by all nodes on which operators will run. Connection Manager has an API that deals with loading, creating, repairing, destroying operators. It is also reponsible for creating and maintaining connections from or to the operators running on the node where the connection manager resides. A connector consists of two objects, each one is part of the two operators in communication. One part is a writer object; the other is a reader object.
For simplicity, the operators created by each connection manager have a process per usage model: operators are not shared among different path instances. Operators are created on demand for each new path instantiated and destroyed after the path is requested to be torn down.

Partial Path Repair (PPR):

At any given time, any operator in the data path can fail, the connection between two operators can be broken, or the processor on which operators execute can fail. In this release, we support partial path recovery: rather than tearing down and restarting the entire path, failing operators are restarted on new nodes in order to introduce minimal disturbance at the end users.
Failure is discovered by catching the I/O exception at each operator from its neighboring communicating operators. The failing operator is identified by the APC service and restarted at a different node.

Existing implementation of the types of paths:

from MP3 to gsm
from gsm/pcm to pcm/gsm
from text to gsm/pcm
from pcm/gsm to text
from MPEG-2 to REAL


API to APC explained:

To understand the API to APC service, one must understand the task dispatching interface to workers in vSpace, because APC service is a collection of vSpace workers. Please refer to vSpace documentation for vSpace task dispatching interface.

As described above, the APC service consists of four types of workers: Logical Path Creation Worker (LPC), Physical Path Creation Worker (PPC), Physical Path Execution Worker (PPE), and Connection Manager Worker (CMGR). The front end workers are LPC workers, which we call here APC Service Workers (ASW). ASW accepts these types of tasks related to data path creation, maintenance, tear down.


How to add operators and connectors to the APC package:


How to run the APC service:

In the vSpace configuration file (see an example), which is the file passed as argument to ninja2.core.vspace.vSpace, one needs to specify this entry to run the APC service:

[Service: APCService]
primary=APC_Worker
#front-end, logical path creation worker
[Worker: APCService:APC_Worker]
class = iceberg1.APCpath.workers.APC_Worker
mapFile = rusty.map
#Physical path creation worker
[Worker: APCService:APC_PPC_Worker] c
lass = iceberg1.APCpath.workers.PPC_Worker
nodeFile = hostnames
#Physical path execution worker
[Worker: APCService:APC_PPE_Worker]
class = iceberg1.APCpath.workers.PPE_Worker
#connection manager workers:
[Worker: APCService:APC_CMGR_Worker]
class = iceberg1.APCpath.workers.CMGR_Worker
[Worker: APCService:APC_CMGR_VAT_Worker]
class = iceberg1.APCpath.workers.CMGR_VAT_Worker

The nodeFile contains the name of the hosts available to run operators. Each host name is separated by a new line; an example can be found here.
On each of the host specified in the nodeFile, a connection manager worker needs to be run. Therefore, on those machines, the vspace configuration file needs to include the entry for connection manager workers.


Known caveats/limitations of APC:

This section documents the known caveats/limitations of APC. Due to time constraints, they are not fixed, but do not affect the functionality of the rest of ICEBERG components.

Differences between Version 1.0 and Version 0.0:


Papers documenting APC:


Javadoc to APC:


Z. Morley Mao, zmao@cs.berkeley.edu
Last modified: Thu Jul 5 09:24:36 EDT 2001