it's just semantics
Via Bill de hÓra came a link to Google Maps, the fool's gold of mashups by Phil Wainewright : what makes mapping mashups so easy, and so untypical of the kind of mashup challenges people face in the real world: the critical data is already structured according to a specification that all of us internalize by the time we graduate from junior school. The same is true of names, dates, time of day, quantities and dimensions.
Actually even things that feel like they should be trivial to map on google maps can turn out not to be - for example, there is no freely available source of data to allow geocoding of Australian street addresses, and even the data that is freely available it doesn't all use the same geodetic system. So if (for example) you merge data from the Geographical Names Board of NSW with Australia Post's Postcode Data File and then plot the location of postcode 2041 (Balmain), it turns out to be an uninhabited island in Sydney Harbour
Or how about plotting a map showing all the places where Australia's two largest retailers compete head to head. Something like "Software Wars", but less amusing. As Stevey said It was easy to think, so it must be easy to do.
It turns out this is yet another area for which there are indeed publicly sources of data, but those sources don't line up with a layperson's expectations that things that are considered competitors should be equivalent. For example, in the Australian Federal Government's "Australian Business Register", Woolworths Limited has a single ABN, with multiple trading names, where as a search for Coles Myer returns lots of different ABNS but none have links to any of their wholly owned businesses (like Target or KMart). If you go to the Coles Myer website you can search for CML stores (which includes the Shell service stations that are also co-branded "Coles Express", as well as the Myer stores, even though Myer is now no longer owned by Coles Myer Limited). The closest equivalent on the Woolworths Limited site is an incomplete list of brands - amongst other things, there's no mention of the 100+ pubs that make up the Australian Leisure & Hospitality Group (which aren't actually on Woolworths listing in Australian Business Register either, since they are only 75% owned by Woolworths).
But one shouldn't be too surprised at how hard it is to find a complete and unambiguous list of all Coles Myer or Woolworths stores on the web - things aren't any clearer when you're inside the corporate firewalls either.
This issue is not really about the "format" of data - whether you use XML or CSV, or whether numbers are zero padded. It's about the "meaning" of data, i.e. the semantics. So would the semantic web help here at all? One could imagine a program trying to build a list of stores belonging to each retailer by yoking together a chain of ownership inferred from the Australian Business Register with store location data from corporate websites. But there are still lots of ad hoc judgement calls to be made. Should the list include entities that are partly owned? if so, do you include only majority owned stores? What about franchised outlets (where someone else owns the store and pays for the right to use a brand name?)
All of these decisions are things that a normal person would probably feel they know the answer for, but they could not define a general method to arrive at that answer before the specific question is asked ("I know it when I see it...") Identifying and resolving all these ambiguities is what makes defining a data model really, really hard.
And, as I previously wrote, people who are motivated to go to all the effort of creating a complete and unambigous data model will have very different concerns than a casual consumer of that data model.