Video is produced at large scales and in a variety of contexts such as social media, traffic monitoring, city planning, or autonomous driving. Additionally, the versatility of video allows to answer a wide range of queries and allows video to deliver valuable insights in many domains. However, analyzing video is challenging and involves the use of error-prone, expensive computer vision algorithms (e.g. neural networks). To make video analytics more practical, we are working on several projects that address the limitations of deploying computer vision in real world applications.
SeeSaw: Ad-hoc video queries
SeeSaw targets the vexing scenario where we want to search for an ad-hoc concept in an image database. The goal is to enable the user to search for objects in the database and find some examples, even if these objects are rare and there is no object detector model available for this object.
Voodoo Indexing: Optimizing queries with opaque filters such as CNNs
An important class of queries in video analytics is the opaque filter query: a query with a selection predicate that is implemented with a UDF, with semantics that are unknown to the query optimizer. Some typical examples would include CNN-style trained image classifiers. Because the optimizer does not know the predicate’s semantics, it cannot employ standard optimizations, yielding long query times. We propose voodoo indexing, a two-phase indexing mechanism for optimizing opaque filter queries.
Video data warehouse
We propose to treat large-scale video analytics as a data warehousing problem: Video is a format that is easy to produce and needs to be transformed into an application-specific format that is easy to query. Analogously, we define the problem of Video Extract-Transform-Load (V-ETL). V-ETL systems need to reduce the cost of running a user-defined V-ETL job while also giving throughput guarantees to keep up with the rate at which data is produced. We find that no current system sufficiently fulfills both needs and therefore propose Skyscraper, a system tailored to V-ETL.
ExSample: Searching video through adaptive sampling of frames
Smokescreen: Controlled intentional degradation in analytical video systems
Video data can be used for a range of public good analytical tasks, such as counting traffic, measuring commerce, or detecting accidents. Governments may have a range of policy goals — preserving privacy, meeting system requirements, and legal compliance — that may be obtained by degrading the video at some potential cost to analytical accuracy. We propose Smokescreen, a video degradation-accuracy profiling system for offering administrators a profile that illustrates the tradeoff between increased analytical accuracy and increased amounts of degradation.