| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Feb | ||||||
| 1 | 2 | 3 | 4 | |||
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| 19 | 20 | 21 | 22 | 23 | 24 | 25 |
| 26 | 27 | 28 | 29 | 30 | 31 | |
I am not often impressed by what people call new technology or “inventions”, as I often find that these things are simply built on existing knowledge and/or modifications/combinations of what was already in place. In other words, many of the things becoming popular today already existed in a somewhat different shape, such as cell phones being created before the iPhone, portable mp3 players before the iPod, search engines before WebCrawler and music before CD (alternatively tape or LP if you’re not in the youngest generation, or mp3 if you feel CD is outdated)
To continue the thought I started with, most “inventions” are not new, and most thoughts build on the ideas of others. This is one of the reasons why I am not often impressed. Whatever you do, someone has probably done it before (I am more innovative than most people, so decided not to say “whatever we do”), but with the right marketing and contacts you will still have a great chance of succeeding with the idea.
Returning to the original thought yet again, what impressed me for the first time in a couple of years was a site from Stanford University, claiming to create 3D model from single photos - something anyone in the area of Computer Vision (or related) knows is close to impossible without at least some manual work. The site however claimed that you simply uploaded the image and you would get a 3D version back. The following is what I got back (you may need a plugin for viewing VRML content)
In the end, it took 1-3 days (actually I thought it failed after I uploaded a photo from Dublin) before I got the results, making me wonder about their method of 3D extraction from the single image I provided. There are after all relatively simple algorithms which lets you define a great number of points (i.e thousands of points) in the image which builds on estimates and probability (simplification, I know). Since the result impressed me a bit, I decided to look through their publications, and will try interpreting it in “plain english” to see if they have found something which I believe could revolutionize the way computers interpret single images and video streams, or if the this is simply yet another experiment on the way to greatness.
Thinking about what’s related, Microsoft is working on a tool (Photosynth) for creating 3D models from collections of photos. Although the creation of a model involves relatively advanced algorithms (I studied many ways of approach both in my thesis and in various projects) a glance at the work makes me think their algorithms needs much time and processing power, and too many photos for an average user. In other words, it’s not something for a startup to hope for without a budget. As another early impression, I find an interesting part in how they created the feature recognition, since many parts of the image analysis should be greatly helped by understanding the scene. On the BBC site:
It picks out distinctive features in each image and cross-references them against the other photographs, checking for similarities.
As a final note (a bit off topic), I would like to apologize to one of my readers who asked about using the POSIT object with OpenCV. I have only looked into this a few times but never actually used it more than for a few quick test runs, but since your question made me think a bit I decided to have a look as soon as I can after moving (I am actually in the process of packing and preparing for an ineresting move to an interesting move to a new country, which is the main reason I haven’t been able to answer the question). Until then, I recommend the Yahoo OpenCV group. I would be interested in reading/hearing about any findings you make, and wish you the best of luck in finding the answer you seek.
A while back I wrote about Computer Vision with a short description of what it is and what we can do with it, today I decided to create a list of what we can do with it, either using only Computer Vision techniques or in conjunction with other technologies.
There are of course many more applications to Computer Vision, but as you can see there is everyday use of it which seamlessly integrate with our everyday passtime. I could also touch on for example robotics, but will wait with that for another time…
There’s a simple explanation to everything, at least if you don’t care about how sufficient the explanation is. The simple explanation for Computer Vision is that this is the reversed technology for 3D Computer Graphics. The medium explanation makes the simple one look ashamed, while the advanced could be seen as advanced physics warping the universe .
Let’s start with the simple explanation - the ‘opposite of 3D Computer Graphics’ - which tells us that since 3D CG maps locations in 3D space to a 2D screen, Computer Vision should do the opposite, meaning we have one or more images of a location and want to find the scene from these images. This translation from 2D back to three dimensions can be done in a number of ways, most of them needing more complex explanations than the simple version can offer.
Moving on to the medium difficulty explanation, we see that Computer Vision has close relatives in fields such as Photogrammetry, Signal Processing and Image Analysis, that it can be used in for example robotics or for creating 3D scenes to walk around in, and that it is one of the most advanced fields in Computer Science at the moment, with very few systems which are both simple/fast to use and completely robust. Much of the focus is on cameras, and the external and internal parameters of these cameras as photos were taken.
The internal camera parameters include the zoom/focal length, the aspect ratio and the pixel size, while the external parameters are the position and rotation of the camera in three dimensions. Since the taken photos are always in two dimensions this leads to major mathematical computational difficulties. These difficulties have for some applications lead to combining the area with preparatory methods. An example is to create an estimated ‘pre-model’ with constraints for the final result, when trying to recreate a scene using Computer Vision methods, or putting constraints by trying to predict the next step of a tracked object.
Since this post is getting a bit long I’ll continue with the advanced explanation and some tips for books and online resources in another post…