Talk:OpenDataPrinciples/machine processable

From OpenGovData

Jump to: navigation, search

Each record in the data should include an identifier. This identifier should be persistent across revisions to the data set so that external references to individual record can follow updates. The identifier can be a globally unique URI identifier following Semantic Web best practices, for instance.

The data format should be documented so that those familiar with the domain of the data set can understand it. All columns, tags, and abbreviations should be described. However, XML schema or the like are not necessary.

P Language Rule

You know you have a truly open format if you can build a parser for it in Perl, Python or PHP in an afternoon. That parser should be able to crawl through the dataset and dump the results into a SQL database. That doesn't necessarily mean that the data is best handled with an SQL database (although most of this material will fall into that category) - just that it can be easily imported into one.


How to Answer Questions Using the Data

Making the data available will be a significant step, but it leaves a big gap.

Suppose I want to ask a new question, such as "what government departments and people work on possible long term health problems resulting from Katrina?".

There's no way all such possible questions can be anticipated so that programs can be written to answer them using traditional coding.

On the other hand, there is some new technology that can support an evolutionary, social network approach. It's a free Wiki for writing, in executable English, the knowledge needed to link the questions to the data to answer the questions.

Google: "executable English".

Personal tools