Google Refine lets you fix and handle huge, messy sets of data
Google has just introduced a new product, and this time it's a PC application (with a browser-based UI). It's called Google Refine, and it solves a problem that is enormous for some people: it lets you take massive sets of "messy data" and massage them into shape so that they're uniform, make sense, and can be statistically analyzed.
The video after the jump shows a very good example, which is based on a CSV file exported from a publicly available data source (a government contract system, in this case). The data is very realistic – descriptions are inconsistent (Firm Fixed Price on some rows and FFP on other rows), and even the number formats are inconsistent (you get 0.78 on one row and a number in the millions on another row).
Google Refine lets you very easily hone in on those inconsistencies and fix them in a myriad of ways. This is an important data tool because those heaps of messy data are often public records, which are available but not transparent; being able to quickly analyze them could expose some very interesting patterns and anomalies in the way that public institutions and governments behave.
[Thanks, Yanksy, for the tip!]













Comments
8
Subscribe to commentsJuan R. PerezNov 17th 2010 11:25AM
OK, what's the difference of doing this in Excel? Should this be inside Google Spreadsheets?
bangonkaliNov 19th 2010 3:45AM
I think this was once what was called Freebase. And I should say it can do a lot more amazing things than what Excel of Google Spreadsheet can. I should say this is a great software. I think its purpose is not for data input but for refining of seriously unfiltered data.
NyaRNov 17th 2010 1:29PM
Why wouldn't you just fire up REPLACE and replace FFM to firm fixed price. Boom, done in 5 seconds, and thats what people want - not some confusing bs.
Shane McCrackenNov 18th 2010 9:24AM
Finding those 20 variations on Firm fixed-priec would take a while in Excel when you're dealing with 30,000 rows. It's seconds in Refine or Freebase Gridworks as it once was...
SethNov 17th 2010 3:35PM
As someone that has actually had to work with large amounts of data for statistical analysis, I can say that this is seriously cool. Yes, you can use "Find and Replace" but to do what they just did in under 5 minutes if seriously cool.
hazardNov 17th 2010 10:30PM
looks v.cool
might be worth pointing out that this is another app assimilated into google not created by google.
m0r1artyNov 18th 2010 2:39AM
Is that a 'Wolfram Alpha killer' Dagger which I see before me?
bangonkaliNov 19th 2010 3:48AM
I think this is far from being Wolrfram Alpha, the original product I believe was called Freebase, and to be honest I was quite surprised it was bought by Google.