The classical approach (think Antonio Criminisi’s seminal work at Microsoft Research in the late 1990s) relied on a clever hack: . If you can identify three orthogonal vanishing points in an image (say, the X, Y, and Z axes of a building), you can recover the camera’s intrinsic parameters and, crucially, set up a 3D coordinate system.
Here is how state-of-the-art systems (like those from Meta, Google Research, or academic labs at ETH Zurich) operate in the wild today: single view metrology in the wild
But the real world is neither clean nor obedient. So how does SVM cheat physics
So how does SVM cheat physics?
We are teaching machines to play architectural detective with a single piece of visual evidence. And it is changing everything from crime scene reconstruction to Ikea furniture assembly. Let’s start with the paradox. A single 2D image has lost an entire dimension. When you take a photo of a building, you collapse depth onto a plane. An infinite number of 3D worlds could have produced that exact 2D projection. Let’s start with the paradox