• David Slakter

Video Classification & Action Recognition Algorithms

Updated: Feb 8

A short guide into the new technology that is shaping our world.

What is Video Classification and why does it matter?

video classification is a artificial intelligence model that is trained by analyzing a sequence of images in order to try and figure out the situational event that is happening in the video. This differs from the widely used image classification algorithms because instead of analyzing an image for a pattern of pixels that the model can recognize as a certain object, video classification uses short term memory to compare pixels across images and derive a classification. For example a image classification might look at a video and classify as there being a apple in one of the frames, in comparison, a video classifier would tell you that the video is of a man picking up an apple and eating it. As you can see, the two models are looking at the same video but the video classifier is able to give more context information surrounding the object in the frame. Utilizing this key difference between the models allows us to create machines that are able to understand complex situations and react accordingly. For example, a robot that is trained to go into a burning house and retrieve a a trapped person needs to not only understand the obstacles that might be in its way, but the situation and the risks that its taking by moving to a location. By using video classification, it could detect that a overhead support beam above it is about to fall and move out of the way in time.

What is the type of model behind video classification?

Unlike image classifiers which use only a popular framework called a convolutional neural network (CNN) a video classifier builds onto this by using what is called a recurrent neural network (RNN). Recurrent neural networks differ in that they use previous outputs as inputs to make a further decision. This is where the short term memory of recurrent neural networks come into play that allow them to make their next decision based off of the outcome of the last one. If a single frame of the video outputs to a human extending an arm, then piecing together the outputs of multiple frames would output that the person is waving. Through this example you can see how the CNNs and the RNNs work together in a video classifier to identify situations through combining the outputs of smaller detection's.

At xIris we're working to put to use this groundbreaking technology that until now, has only been talked about in research papers. By using state of the art video classifiers, we're able to identify anomalies that are caught by security cameras, but are never reported on efficiently to authorities when the time calls for it. By effectively giving a security camera the brains of artificial intelligence, authorities can be notified in real time when iris detects a break in, vandalism or any damaging act.

If you would like to learn more, don't hesitate to email us at


xIris CTO

#InventingTheFuture #ArtificialIntelligence