Graph executor

Analysis apps on the VisionAppster platform are built using a data flow graph (DFG). A data flow graph consists of nodes that can send and receive messages. The edges of the graph describe dependencies between nodes.

Each node in the graph processes its input arguments and produces some output arguments as results. The output arguments can then be passed to other nodes in the graph.

Unlike in imperative programming, the execution order of graph nodes is not fully defined in advance. The system is free to choose any order as far as the dependencies are satisfied. This makes it possible to execute many nodes in the graph simultaneously.

As most cool things in computer science, the idea of using data flow graphs to build programs is nothing new. Since 1960's, many programming languages and APIs have implemented the paradigm in a form or another.

The VisionAppster platform neither introduces a new data flow programming language nor requires one to build programs using data flow primitives from the ground up. Instead, a hybrid approach is taken: the nodes – a.k.a tools – are relatively complex and can be written with a traditional programming language such as C, C++ or anything else that provides a C interface. This makes it possible to write algorithms in a traditional way but still benefit from parallelization on a higher level.

Consider the simple example below. The image sent by an image source is processed by two tools and the results are subtracted from each other.

A simple data flow graph

A simple data flow graph

If you were using a traditional image processing library, the code you write would be something like this (pseudo-code):

def run():
    while True:
        image = read_next_image()
        if image:
            filtered = median_filter(image)
            mapped = lookup_table(image, my_lookup_table)
            yield image_diff(filtered, mapped)

While simple, this approach has a severe drawback: even if the algorithms were internally parallelized, the program as a whole would run sequentially. I/O latencies while reading an image are not utilized, and the next image will only be read after the previous one has been completely processed. The same applies to all function calls: median filtering takes longer than lookup table and image difference combined, but the quick parts can start only after the filtering is completely done.

The VisionAppster platform makes use of the fact that the tools are only partially ordered. The filtered and mapped images must be both available before the subtraction tool can be run, but nothing prevents the system from running the median filter and the lookup table simultaneously. Furthermore, nothing prevents the system from making multiple calls to the median filtering tool simultaneously, provided that the outputs are ordered correctly. Meanwhile, the image source can start fetching a new image.

The graph executor is able to dynamically allocate processing threads. If you have four cores in your CPU, the likely outcome would be that median filtering tool would take three while the rest would be run on one core.

As a result, the system is able to maximally utilize all processing cores even if none of the algoritms are internally parallelized. An important thing to note is that the design optimizes throughput, not latency. If it is important to get a single result out as quickly as possible, it does not help that the next image is already being processed on another core. If minimizing latency is important, you need to pick tools that parallelize their execution internally.