As for “can it identify the city from a photo”: it depends.
Google has no intrinsic awareness of “this building is from this city”. BUT it does have a huge database correlating pictures with text on webpages. If 30 different travel websites use the same photo of the Empire State Building, Google knows that that picture is associated with the building. When you upload that same photo, Google can tell you about this.
But if you upload another shot of the Empire State Building, taken from a different location or angle, it is extremely unlikely that Google will know what that building is – unless it’s visually so similar to another preexisting shot of the building that its algorithm conflates the two.
In other words, Google knows about pixels in images and the texts on webpages that surround that image. It doesn’t yet have the ability that humans have to be able to discern an object in imaginary 3D space, spinning it around until it fits some pre-existing concept of the Empire State Building. For example, if you rotate a face upside down or flip it left and right or crop it just right, GIS probably won’t find it.
On the other hand, Microsoft Research is working on technology that clusters photos together to make 3D models of a place by clustering similar photos and overlapping them on top of each other and calculating the perspective shifts in each one. So if you feed the software 3000 tourist pictures of the Notre Dame cathedral, it’ll generate a photorealistic 3D model of that place from all the pictures.
This doesn’t quite extend to image searching yet, but once the information is there, it ought to be possible to match your uploaded photo to something already in their geometry-adjusted databases. I wouldn’t be surprised if that was their next project for Bing… so keep your fingers crossed.