Computer Vision Explosion

We are about to see an explosion in the use of computer vision systems. If you thought Kinect was cool or you think Creepy Cameraman is scary, the technology right around the corner, and its impact on our lives will blow you away.

We’ve all dreamt of the day when natural user interface (NUI) systems were “real”. For example, in 1984 I built, as a high school project a system that allowed my school to do a mock Presidential election…by voting via speech. I wish I could find the specs on the voice recognition card I used for the Apple ][ (or even the code I wrote <sad face>), but suffice to say the promise was big, the results…not so much.

I sincerely believe (again?) that we are finally, really, truly, on the cusp of a NUI explosion. We’ve seen massive improvements in the real-world usage of touch (iPhone), voice (Siri), and computer vision (Kinect) the the last few years. I think this is just the beginning. 

There will be huge strides made in voice and touch based input, but in my view, the area where our world will be rocked the most is in computer vision. Cameras are everywhere. They are dirt cheap. They can see things we can’t. And as amazing as the tech in Kinect is at decoding all those signals, interpreting them, and figuring out your body’s intent is, you haven’t seen anything yet.

I had the chance to visit Israel in 2011. I met with several companies in the computer vision space and visited several of the top Israeli university research groups working on computer vision. I was under NDA so I can’t discuss details, but I’m sure you are aware that Israel has been leading the way in computer vision technology.

I found it amusing the Creepy Cameraman story and this story on a new Microsoft patent came across my feed at about the same time.  I also recently upgraded the CCTV system in my house from analog cameras circa 2002 to modern IP based digital cameras (I use a GeoVision based CCTV DVR system that is functional but very haphazardly implemented).

These modern cameras all record 1080p in real time with audio. The software I have is just OK, but is nowhere near state of the art.

Another example: sports cameras such as GoPro and Countour. Next time you are a bike event, out on the lake, or skiing notice how many people are wearing these cams. The quality is fantastic and they are getting dirt cheap.

Remember, that due to networks, we have the ability to combine camera inputs from multiple sources, meaning that future computer vision systems will not be integrated as Kinect is today.

Some scenarios where I see breakthroughs coming:

  • Detecting and tracking people’s emotional state. Imagine your TV being able to sense whether you are happy, scared, sad, or mad and adjusting the content to either amplify that state or change it. This could be used for good (making a game even more immersive) or bad (adjusting advertising).
  • Predicting intent. By understanding ‘normal’ behavior games, user interfaces, and other systems will be able to predict what you are going to do, before you do it.
  • Tele-presence. Kinect shows how easy (ha!) it currently is to allow a computer to, in real-time, build a 3D model of human bodies and do intelligent things (control a game). We also know its easy (ha!) to map photorealistic imagery on 3D models with Google/Bing/Apple Maps.  Combine these technologies and it’s not a stretch to see Princess Lea floating in front of R2D2.
  • Augmented Reality. The work Google is doing on Glasses is a great example. I can imagine combining my the three other examples above with not only a head mounted camera, but also a more direct input into the human vision system (a tiny monitor you wear like glasses is actually pretty lame; I’m much more excited about research going on regarding directly inserting imagery into the brain).

Most importantly, I think, is the impact these breakthroughs will have on mobile. I joke that I think “Mobile is Dead”. What I really mean is that I think mobile is now ubiquitous and everywhere and that it’s high time we stop thinking about it as some discrete ‘space’.

What do you think? What scenarios do you see coming? What are the risks to society and industry?

© Charlie Kindel. All Rights Reserved.