Automatic Path Creation Service - APC
This document explains the architecture of the APC (Automatic Path
Creation Service), a cluster-based infrastructure service. APC is a component within ICEBERG to establish
data flows between heterogeneous communication endpoints by composing
appropriate transoding operators. Call agents of the ICEBERG
artchitecture make requests to the APC
service for data path creation by providing the required endpoint
information. APC service provides a clean separation of the data
paths from the control paths by encapsulating the data path creation,
instantiation, and maintenance process.
Table of Contents
Architecture of APC:
This section describes the terminology used in the following sections.
Operators
An operator is a unit of computation on some data. Operators are
strongly-typed: they have a clear definition of the input and output
types. In addition, operators have various attributes such as
communication protocol, computational requirement, static data input
(e.g. a database) etc. Only based on these attributes, is automatic
composition of services possible.
Operators can have multiple inputs
and outputs (i.e. aggregators, broadcasters).
Operators have only soft-state: this means that if an
operator fails and is restarted, there is no need to recover its state
information. During a path execution a failed operator can just be
restarted without any explicit error
recovery mechanism. Application level protocol is needed to provide
guarantees such as reliability, ordered, end-to-end data delivery.
Connectors
A connector is an abstraction of the Application Data Unit (ADU)
transport mechanism between two operators (e.g, RTP connector, UDP
connector, etc.) Each connector is characterized by a specific transport
protocol.
The concepts of operator and connector are not completely distinct from each
other. Certain operators can use only a limited set of
connectors. Some have built-in connectors. For example, a MPEG decoder assumes
the input is a file and uses standard I/O. To make such an operator transmit and receive
data using sockets, we use run-time linking techniques to replace standard I/O with
socket I/O.
Overview:
APC service is responsible for creating, maintaining, and
eventually freeing up data paths consisting of operators strung together
by connectors. The overall path construction process can be
illustrated through the following diagram:
It is a iterative process of continuous optimization.
- Step 1: Logical Path Creation:
Logical path is defined to be an ordered sequence of
operators. Operators are registered using the SDS (Service
Discovery Service). Each operator has an XML description associated with
it, describing its attributes such as input type, output type,
communication protocols supported. More importantly, each
operator has various cost metrics associated with it.
(e.g. computational latency, memory usage, and other application
specific QoS metrics). Users provide the goals for the
optimization process. Depending on the applications, these
goals can vary. Logical path creation process should return a
list of possible paths, ordered by the decreasing cost based on
the user's input parameters for optimization.
Currently, we simplified the implementation by using only the
format type to describe all operators used. Furthermore,
currently the APC service does not use any bootstrapping SDS to dynamically
discover operators; instead it receives this information by
given static configurations beforehand. The search for the
optimized path is also not done. A depth-first-search
implementation is used to find the first matching path, since
the types of paths constructed right now usually only consist of
one to four operators. Consequently, such optimizations are
unnnecessary.
- Step 2: Physical Path Creation:
This step is tightly coupled with the first step. There are two
types of operators, those that can be created on the fly by
downloading the code, and those that cannot be dynamically
instantiated and only existing instances of them can be used.
The instances of running operators are again advertised and
discovered through the SDS service. Again, for simplification,
this implementation only deploys operators that are dynamically
created and destroyed after the path has been torn down. Since
most operators in the data paths constructed are light-weight
transcoding operators, the overhead of instantiation is low.
- Step 3: Path instantiation, execution, maintenance, querying:
Given the physical path descriptions, the instances of
operators are created and data flow is started from the source
endpoint. During the lifetime of the path, the APC service
actively monitors the liveness of the operators to make sure
that they are functional. Any operator can also report problems
to the APC service about its neighboring operators, so that the
path can be repaired when necessary. The control path plays an
important role here, since it is how operator deletion,
insertion, repair can be accomplished. Control path is used for
exception handling, controling parameters of path components,
monitoring and analyzing path performance; thus, it needs to be
independent of data paths and be highly robust. A data path can be
"walked" using the control path. The APC service has
the handle to all the path components. In addition, the control
path can also overlap the data path: each path component (operator)
has handles to its two neighboring operators. The control
paths allow the querying of the work status of the operators.
- Step 4: Path tear down:
Once the path has finished providing the service, or the end users
have sent a termination request, the path is torn down.
Resources are freed. For optimization reasons, the path can
be cached and reused in the future, both the logical path
description and its physical path description. Thus, the path
can be kept without being torn down.
Implementation:
This section describes some of the interesting implementation details
and functionalities provided by APC in the current release.
Simplifications of the design:
For the purpose of prototyping, we made a number of simplifications to
the above ideal design:
- Simplified operator descriptions: currently all operators are
uniquely described using its input and output type. No XML
descriptions are used right now. Operators are
created dynamically and destroyed once after their use. APC
service is statically configured with all the known operators
beforehand.
- Process per operator model: each operator instance is a process
that is not shared among other instances of the paths.
APC service:
The APC service itself is a Ninja vSpace cluster service. The idea is
that multiple nodes will implement the APC service, providing
fault-tolerance and load-balancing features. This will require the
sharing of data among the nodes using distributed data structures (DDS).
In this release, we only consider a single-node APC service.
This release uses Ninja
vSpace, an event-driven cluster service
programming platform. vSpace workers or service instances are
asynchronous event handlers that accept tasks (i.e. units of work to
be processed) as well as completions (i.e. return
values). Interactions with the distributed data structures and other
modules is achieved through non-blocking, split-phase RPC calls,
taking an upcall as a parameter and returning immediately. At a later
time, when the actual result is received, the upcall handler is
invoked. We perceive a great improvement in scalability as a result,
due to reduced overhead in context-switching and thread state.
The APC service is structured in the manner illustrated in
the figure above. It consists of four
types of workers -- Logical Path Creation Worker (LPC), Physical Path
Creation Worker (PPC), Physical Path Execution Worker (PPE), and
Connection Manager Worker (CMGR). LPC workers create logical paths
provided with a path request (arrow 0) containing the two end point
descriptions of data format, IP address, port number, input arguments
to the service, QoS metric, any required operators in the data
path. Then they
dispatch a task (arrow 1) to the PPC workers which determine
the physical locations to run individual operators. This task consists
of the original path request as well as the logical path description
of the sequence of operators joined by connectors. Note, the path may
not be linear.
Subsequently PPE workers receive a task (arrow 2) to instantiate
paths from PPC workers. This task contains the physical path
description (i.e. the operator name as well as their running
location). PPE then contacts CMGR worker (arrow 3) which runs
on each physical node of the path to create operator instance (if
operators are to be created on the fly)
and the connector between operators. This task (arrow 3) contains the
name of the operator as well as its input arguments and the path
identifier. There are two stages in path execution: First, a CMGR worker
instantiates the operators and creates the necessary connectors. Once
the entire path has been instantiated, a CMGR worker starts the operator,
which begins computation and starts to receive from and send data
through its connectors.
The path requestor will finally receive a completion event indicating
the success or failure of the path request after a successive sequence
completion events flowing from CMGR workers back to the client (arrows
4-7). If any worker fails, the underlying vSpace cluster will take
care of resubmission of the task to another worker. After the retry
has exceeded a specified number, a failed completion event is
propagated back to the client indicating the failure and its reason.
This decomposition of tasks within the APC Service maximizes the
concurrency and minimizes blocking.
Each path has associated with it a unique
identifier. Users of paths can send requests to APC to make changes to
paths (e.g., change data output location). If any operator or
connector fails, PPE will initiate repair by noticing the failure
itself, or receiving a message from the corresponding CMGR worker, or
receiving a repair request from CMGR workers running on neighboring
operators of the failed operator. This is due to the redundant
control paths mechanism.
Connection Manger:
This is also a vSpace service that must be implemented by all nodes
on which operators will run. Connection Manager has an API that deals
with loading, creating, repairing, destroying operators. It is
also reponsible for creating and maintaining connections from or to
the operators running on the node where the connection manager
resides. A connector consists of two objects, each one is part of the two
operators in communication. One part is a writer object; the other is
a reader object.
For simplicity, the operators created by each connection
manager have a process per usage model: operators are not shared among
different path instances. Operators are created on demand for each
new path instantiated and destroyed after the path is requested to be
torn down.
Partial Path Repair (PPR):
At any given time, any operator in the data path can fail, the connection between
two operators can be broken, or the processor on which operators execute can
fail. In this release, we support partial path
recovery: rather than tearing down and restarting the entire path,
failing operators are restarted on new nodes in order to introduce
minimal disturbance at the end users.
Failure is discovered by catching the I/O exception at each
operator from its neighboring communicating operators. The failing
operator is identified by the APC service and restarted at a different
node.
Existing implementation of the types of paths:
from MP3 to gsm
from gsm/pcm to pcm/gsm
from text to gsm/pcm
from pcm/gsm to text
from MPEG-2 to REAL
API to APC explained:
To understand the API to APC service, one must understand the task dispatching
interface to workers in vSpace, because APC service is a collection of vSpace workers.
Please refer to
vSpace documentation for vSpace task dispatching interface.
As described above, the APC service consists of four
types of workers: Logical Path Creation Worker (LPC), Physical Path
Creation Worker (PPC), Physical Path Execution Worker (PPE), and
Connection Manager Worker (CMGR). The front end workers are LPC workers, which
we call here APC Service Workers (ASW). ASW accepts these types of tasks related
to data path creation, maintenance, tear down.
- PathCreate_Task(EndPointInfo srcEndpt,
EndPointInfo destEndpt)
This task is dispatched to ASW to request to have a
new data path built from the source point with srcEndpt to
destination point with destEndpt. EndPointInfo is a data
structure containing relevant information needed for path
construction (e.g. data format, IP address and port number etc.)
Details are explained in the IAP section.
Each path constructed is identified by a unique path ID which is
returned to the caller of the method. This is needed as an
input parameter of other functions.
- PathChangeEndPt_Task(int pathID,
EndPointInfo newSrcEndPtInfo)
This task is used to change the source endpoint of the physical path
constructed identified by pathId.
Ideally, the first operator should be changed without restarting the entire
path. For simplicity, the current implementation involves
tearing the entire path and restarting it.
- PathTearDown_Task(int pathID)
This task is to tear down the path
constructed identified by pathId to free up any used resources.
How to add operators and connectors to the APC package:
- To add operators: If the operator is a unix process
command, one needs to extend ProcessOperator class. All operators
need to specify its natural input and output block size to help
reduce overhead of sending small data over the wire. Otherwise,
one needs to extend Operator class to specify its connection
mechanism and various other properties of the operator.
- To add connectors: If the connector is a stream connector,
one needs to create a reader which needs to implement
StreamConnectorReaderIF interface and a writer implementing the
StreamConnectorWriterIF. Currently, only streaming connectors are
supported: UDP and RTP.
How to run the APC service:
In the vSpace configuration file (see an example), which is the file passed as argument
to ninja2.core.vspace.vSpace, one needs to specify this entry to run the
APC service:
[Service: APCService]
primary=APC_Worker
#front-end, logical path creation worker
[Worker: APCService:APC_Worker]
class = iceberg1.APCpath.workers.APC_Worker
mapFile = rusty.map
#Physical path creation worker
[Worker: APCService:APC_PPC_Worker]
c
lass = iceberg1.APCpath.workers.PPC_Worker
nodeFile = hostnames
#Physical path execution worker
[Worker: APCService:APC_PPE_Worker]
class = iceberg1.APCpath.workers.PPE_Worker
#connection manager workers:
[Worker: APCService:APC_CMGR_Worker]
class = iceberg1.APCpath.workers.CMGR_Worker
[Worker: APCService:APC_CMGR_VAT_Worker]
class = iceberg1.APCpath.workers.CMGR_VAT_Worker
The nodeFile contains the name of the hosts available to
run operators. Each host name is separated by a new line; an
example can be found here.
On each of the host specified in the nodeFile, a
connection manager worker needs to be run. Therefore, on those machines,
the vspace configuration file needs to include the entry for connection
manager workers.
Known caveats/limitations of APC:
This section documents the known caveats/limitations of APC. Due to time
constraints, they are not fixed, but do not affect the functionality
of the rest of ICEBERG components.
- Process per operator model: each operator instance is a
process. No multithreading support exists by default -- operators are not
shared among different instances of paths. The result of this
simplified model is noticable latency due process execs in
real-time applications (text-speech, mp3-gsm paths) and limited
scalability in terms of the number of operators that can be
supported per node.
- Lack of flow control: Real-time applications need to have
necessary flow control to buffer or slow down data when
necessary. Currently, no generic flow control mechanism
exists. Each operator has a known output block size and input
block size. The minimal size of output data from any operator
is specified by its output block size to limit the overhead of
sending data over the wire.
Differences between Version 1.0 and
Version 0.0:
- The API to APC is completely asynchronous. A task is sent to an APC worker and
a completion event will be returned denoting the return results or execution status.
The API is completely nonblocking.
This event-driven programming model greatly increases the scalability of APC and
provides graceful degradation.
- APC can now handle nonlinear paths. An operator in the data path can take in
multiple streams of input from more than one operator. The data path is a directed
acyclic graph (DAG).
- APC now supports wide-area task dispatching: clients requesting service
from APC does not have to be local.
Papers documenting APC:
- Achieving Service Portability in ICEBERG, by Z. Morley Mao
and Randy H. Katz
IEEE GlobeCom 2000, Workshop on Service Portability
(SerP-2000).(ps.gz)
- Network Support for Mobile Multimedia using a Self-adaptive
Distributed Proxy, by Z. Morley Mao, H. Wilson So, ByungHoon
Kang, and Randy H. Katz
11th International Workshop on Network and Operating
Systems Support for Digital Audio and Video (NOSSDAV-2001).(ps.gz,175 KB).
-
Fault-tolerant, Scalable, Wide-Area Internet Service
Composition, by Z. Morley Mao and Eric A. Brewer and Randy H. Katz
U.C. Berkeley Technical Report UCB//CSD-01-1129, Jan 2001. (ps.gz)
Z. Morley Mao,
zmao@cs.berkeley.edu
Last modified: Thu Jul 5 09:24:36 EDT 2001