An appalling decision from the Canadian federal government today, reported by the Globe & Mail here: “Tories scrap mandatory long-form census”
The census is a vital data source for all sorts of transportation and land use planning. A voluntary census is nearly useless, since the sample will suffer from voluntary response bias. This will do nothing to reduce the number of analysts and bureaucrats – provincial governments will be forced to step in and collect the same data themselves, but this will inevitably result in the loss of province-to-province comparisons.
As for privacy, the alleged basis for this decision: Statistics Canada jumps through all sorts of hoops to ensure the privacy of respondents. It would be difficult if not impossible to connect any of the published census data back to an individual. Yes, the questions are detailed and probing; but the anonymization process used by Stats Can is tougher than anywhere else in the world that I’ve seen.
I’ve been writing a short report on open source transportation software, and I ran across an interesting website along the way. Apparently, the Creative Commons people are trying to kickstart a new Science Commons for factual information. Unlike creative content, facts are not covered by copyright protection, but collections of facts (i.e., databases) fall into a grey area and are generally covered.
If you’re intrigued, start with this brief article on the subject. I think it motivates the idea of a Science Commons quite well, particularly the need for machine-readable metadata and broad searchable databases. Back when I worked in computer science, I was really spoiled by the excellent Citeseer article database – there’s no equivalent for transportation/urban planning. While I’m still in university I have good access to databases, but some journals still don’t even have the table of contents online, let alone the articles themselves. (I’m talking about you, Transportation Research Record.)
I’m not naïve enough to think that “data wants to be free!” There are clearly many datasets that will not be collected or maintained without commercial incentives. But there is also a lot of data that is only locked up due to historical quirks in the publishing industry, or political trends in the academic sector to prefer commercialisation and patents to the tradition of open science. Bring on the Neurocommons… but dear god, please find a better name for it.