I am not often impressed by what people call new technology or “inventions”, as I often find that these things are simply built on existing knowledge and/or modifications/combinations of what was already in place. In other words, many of the things becoming popular today already existed in a somewhat different shape, such as cell phones being created before the iPhone, portable mp3 players before the iPod, search engines before WebCrawler and music before CD (alternatively tape or LP if you’re not in the youngest generation, or mp3 if you feel CD is outdated)
To continue the thought I started with, most “inventions” are not new, and most thoughts build on the ideas of others. This is one of the reasons why I am not often impressed. Whatever you do, someone has probably done it before (I am more innovative than most people, so decided not to say “whatever we do”), but with the right marketing and contacts you will still have a great chance of succeeding with the idea.
Returning to the original thought yet again, what impressed me for the first time in a couple of years was a site from Stanford University, claiming to create 3D model from single photos - something anyone in the area of Computer Vision (or related) knows is close to impossible without at least some manual work. The site however claimed that you simply uploaded the image and you would get a 3D version back. The following is what I got back (you may need a plugin for viewing VRML content)
In the end, it took 1-3 days (actually I thought it failed after I uploaded a photo from Dublin) before I got the results, making me wonder about their method of 3D extraction from the single image I provided. There are after all relatively simple algorithms which lets you define a great number of points (i.e thousands of points) in the image which builds on estimates and probability (simplification, I know). Since the result impressed me a bit, I decided to look through their publications, and will try interpreting it in “plain english” to see if they have found something which I believe could revolutionize the way computers interpret single images and video streams, or if the this is simply yet another experiment on the way to greatness.
Thinking about what’s related, Microsoft is working on a tool (Photosynth) for creating 3D models from collections of photos. Although the creation of a model involves relatively advanced algorithms (I studied many ways of approach both in my thesis and in various projects) a glance at the work makes me think their algorithms needs much time and processing power, and too many photos for an average user. In other words, it’s not something for a startup to hope for without a budget. As another early impression, I find an interesting part in how they created the feature recognition, since many parts of the image analysis should be greatly helped by understanding the scene. On the BBC site:
It picks out distinctive features in each image and cross-references them against the other photographs, checking for similarities.
As a final note (a bit off topic), I would like to apologize to one of my readers who asked about using the POSIT object with OpenCV. I have only looked into this a few times but never actually used it more than for a few quick test runs, but since your question made me think a bit I decided to have a look as soon as I can after moving (I am actually in the process of packing and preparing for an ineresting move to an interesting move to a new country, which is the main reason I haven’t been able to answer the question). Until then, I recommend the Yahoo OpenCV group. I would be interested in reading/hearing about any findings you make, and wish you the best of luck in finding the answer you seek.