Big data is nothing without structure. A crucial element in the future will be finding consistent, structured data in a sea of formats.

A sea of structured data formats
Meta-data helps to define structured data

‘Meta-data’ is data about data and it’s fundamental to the importance of the data itself. It describes the structure and it allows users of the information to understand what they are looking at and what to expect. Meta-data often acts like a contract or protocol – ensuring the data meets certain minimum specifications. If we take data regarding a restaurant we may require certain key facts are always present – for example an address, telephone number and perhaps a type of cuisine.

When a set of data conforms to a particular specification we often say that it represents an ‘entity’ – a sort of real world object – like a restaurant, a type of car, a person’s health record or anything you care to imagine, physical or conceptual.

Unfortunately although the Internet is full of data it’s mostly just text or unstructured data and it lacks the kind if meta-data we would like. While a human can often read a company website, understand the nature of the business and find the address details – its a different story for a computer.  To a computer the page is simply text and finding the business’ core activity or location in the world would be a challenge.

It’s for this reason that web pages have meta-data encoded behind the scenes to let search engines like Google understand more detail about the main subject of the page.  For web pages, the World Wide Web Consortium (W3C) grouped together to decide upon the format of this meta-data as part of the HTML standards (as early as 1994).  The format was adopted by companies all over the world, including Google, Yahoo, and Microsoft, so that computers would understand web pages in the same way.
Whilst the meta-data standards for web pages has been well covered by W3C there is no consortium to decide on global data formats for an infinite number of other topics – such as real estate, chef’s recipes, movies or mens socks.  This doesn’t mean, of course, that people haven’t tried to model some of these things in data in the past – but when they do these data formats often remain proprietary and limited to a single company database rather than being adopted as a standard across the world.