I gave it a series of 11 images stripped of all metadata. It performed quite well, only misidentifying the two taken in a small college town in the NE of the US. It got two questions correct on photos taken in Korea (one with a fairly clear view of Haneul Park, the other a rather difficult to identify picture not resembling anything on google of Sunrise Peak).
It got every other question in the US correct, ranging from some under-construction Austin taken from the river to some somewhat difficult shots in NYC (the upper halves of some building from Rockefeller terrace to the black wall of the MOMA).
While not perfect, I'm bluntly shocked at how well it performed
The anthropomorphisation certainly is weird. But the technical aspect seems even weirder. Did OpenAI really build dedicated tools to have their models train on Google Street View? Or do they have generic technology for browsing complex sites like Street view?
I doubt the model was trained on Street View, but even if it was, LLMs don’t retain any “memory” of how/when they were trained, so any element of truthfulness would be coincidental.
If it's trained on street view data it's not unlikely that the model can associate a particular piece of context to street view. For example, a picture can have telltale signs that street view content has, such as blurred faces and street signs, watermarks, etc.
Even if it's not directly trained on street view data it has probably encountered street view content in it's training dataset.
The training process doesn't preserve information needed for the LLM to infer that. It cannot be anything other than nonsense that sounds plausible, which is what they do best.
I think the test which the OP performed (to pick a random street view and let it pinpoint it) would indicate that it has ingested some kind of information in this regard in a structured manner.
This is the most impressive ChatGPT chat I’ve seen yet. While I theoretically can accept how large-scale probabilistic text generation can lead to this chain of “reasoning”, it really feels like actual intelligence.
It's been intelligence for a long time; the goalposts just shift, and people can't abstract the idea to an LLM. But language processing and large data processing itself IS a form of intelligence.
Maybe you're right, but I think it's more likely that it had been trained on street view photos and then invented a plausible justification for the guess afterwards (which is something I often see ChatGPT do, when it easily arrives at the correct answer, but gives bullshit explanations for it).
I played a round of Geoguessr against it and while it did a shockingly good job compared to what I was expecting, it still lags behind even novice human players.
The locations and its guesses were:
Bliss, Idaho - Burns, Oregon (273 miles away)
Quilleco, Biobio, Chile - Eugene, Oregon (6,411 miles away)
Dettighofen, Switzerland - Mühldorf, Germany (228 miles away)
Pretoria, South Africa - Johannesburg, South Africa (36 miles away)
Rockhampton, Australia - Gold Coast, Australia (437 miles away)
I had cause to try Google Lens today and found the location to exact address thanks to a veterinary clinic which was in the background of an image. ChatGPT got the country but wrong city.
I gave It some photos from denmark, didn't even bother to strip the metadata. One is correctly said give of "Scandinavian vibes" every other photo was very wrong. I also gave it a photo of the french Alps, it guessed Switzerland.
I gave o4-mini-high a cropped version of a photo I found on Facebook[0][1], and it quickly determined that this was in the UK from the road markings. It also decided that it was from a coastal city because it could see water on the horizon, which is the correct conclusion from incorrect data. There is no water, I think that's trees on a hill. It focused heavily on the spherical structure, which makes sense because it's distinctive, though it had a hard time placing it. It also decided that the building on the left was probably a shopping centre.
It eventually decided that the photo was taken outside the Scottish Exhibition and Conference Centre in Glasgow. It actually generally considered Scottish locations more than others.
The picture was actually taken in Plymouth (so pretty much as far from Scotland as you can get in Britain), on Charles Street looking south-east[2]. The building on the right is Drake Circus, and the one on the left is the Arts University. It actually did consider Plymouth, but decided it didn't match.