Deepak writes on Money, Markets and Trading at Capital Mind. He has co-founded MarketVision, a financial knowledge startup and has traded the Indian Markets for nearly a decade. Deepak lives in Gurgaon and fears using long words. He is @deepakshenoy on Twitter. In this guest column, he writes about Extensible Business Reporting Language, a structured and uniform way to describe financial data.

Extensible Business Reporting Language (XBRL) is the new game in town. It’s just a structured way to describe business data, by which I mean this:

a) People release their financial data in multiple formats

b) For example: What some call revenue, others call it income, yet others call it Total sales.

c) But they all mean the same thing.

d) In such a circumstance, it’s useful to define a single term for whatever means the same thing. If I hear the word revenue I know it means revenue, and not something else.

Additionally, to have the data structured in a manner that is computer decodable is useful. Today, no one can easily scan through thousands and thousands of pdf documents. To do any serious level of analysis, either comparing a company with others, or to its own past, you need a computer. And for a computer, it’s altogether too painful to read:

Revenue: 100.10 (Crores)


Revenue: Rs. 10,010.00 (Lakhs)

An Introduction to Markup

One of the ways to solve this is the language of the world wide web: HTML. It’s called “HyperText Markup Language”, where, when if you view the source of any web page, it is always plain text, with strange things inside < and > brackets. The strange things are the “markup” – essentially, stuff that doesn’t get displayed but defines some other stuff that does. Okay, that’s too confusing. Assume I wanted to something in bold. I would do:

<b>This is my boldness.</b> And this is not.

which, when a browser sees it, translates to:

This is my boldness. And this is not.

The <b> and </b> are tags that define the boundaries of whatever was to be bold. So italics would be <i> and </i> and so on. HTML has a predefined set of tags. But you could describe anything with such tagging. Books can be:

<BOOK Name=”The Lost World”>
<Author>Michael Crichton</Author>
<ISBN> Something </ISBN>

The structure is thus hierarchical – An author and ISBN tags come INSIDE a Book tag. This is kinda open markup, where we’re just going about describing anything. XML (Extensible Markup Language) is the mother of all such markups (based on < and > limiters).

There are other forms of “markup languages” that use XML style tags. Of them XBRL is one – to the extent that you will hear that XBRL is a dialect of XML. Don’t worry about definitions – XML historians will cringe, but I don’t think it’s important and I’ve worked on XML for a long time (wrote this paper for a conference in 2000)

So, if we had a balance sheet and wanted to describe it, we could use


Or something of that sort. But what sort?

Enter the XBRL Taxonomy

If you tell people what needs to go in what tag, it’s effectively a dictionary of terms. This, for XBRL is ataxonomy. Nothing to do with tax. Or the economy. It’s just a word, machan as they say in Bangalore.

The full taxonomy of XBRL for Commercial and Industrial Companies (essentially, non-banks) is here.  It sounds very complex, and it is. Probably unnecessarily, but who cares, eventually there will be tools to decode this (and I could write one in a couple of days, it’s that easy).

So the sample instance document – this a technical jargon for the word “example” – for M&M has gloriously unreadable lines like this:

<in-gaap:FixedAssets id=”TAB320” decimals=”-5” contextRef=”I2009” unitRef=”INR“>91418100000</in-gaap:FixedAssets>

<in-gaap:FixedAssets id=”TAB330” decimals=”-5” contextRef=”I2008” unitRef=”INR“>76255100000</in-gaap:FixedAssets>

The in-gaap is a prefixed namespace identifier, which tells a program in what context the FixedAssets tag applies. In other words, it’s useless for you. You should care about FixedAssets which is a term you understand.

There are two FixedAssets tags. What’s the difference? The contextRef (case sensitive) is an attribute that is different for the two rows; one is I2009 and one is I2008.

That simply means different years, though in reality it refers to different areas in that XML document, which tell you that the I2009 refers to the period ending 31/3/2009. The rest is self explanatory except thedecimals=”-5″ which in this context is that the number has been rounded up to lakhs and then pushed back to INR.


This kind of structure ensures that we can get data into excel sheets for each company, and then compare them once we have the XBRL. That means I could also aggregate across sectors and compare easily – that is, find out how much the average debt/equity ratio of the auto sector is, which companies are debt free etc. The point is not that this data is not available today – it is, but much of that data is just wrong – it’s that it lowers the bar for good analysis.

On the other side, companies have to work harder to file using XBRL. This involves a lot of cross checking and tools and training. There is a fairly good business model around the compliance area as well, with both accountants and company finance officials needing to know more about the format and specification.

Mandatory Filing

The government of India has made it mandatory for all companies (except banks and FIs) to file returns using XBRL this year.

I presume that once results are filed in XBRL, that we will be able to get the XBRL as well and therefore easily move the data into databases, structured appropriately.

(c) 2011 Deepak Shenoy. Reproduced with permission from Capital Mind. The views expressed above are those of the author, and not necessarily representative of the views of