Data Starvation

21 Mar 2019

After the retirement of my dad, he started to apply for his pension benefits, I realised how analogue the Indian bureaucratic system is. Everything is still done on paper and files. Even if banks are very digital but a paper copy is being maintained considered more trustworthy.

I have studied and worked with numerous open data sets available freely for the past 3 years while in the US. Most of those data sets were about something or the other in the US. The availability of structured data about employment, GDP, waste disposal in rivers and such is taken for granted there. In fact, the city of Boston, where I was living, has a data portal dedicated to making structured data openly available. Even then, the people working with these data sets often wanted more data to infer information.

I, being unaware, presumed that, in this era of the fourth revolution, data is available everywhere in the world similarly. But then, I came back to India, my home country. I spent some time in a meditation retreat, a little more in exploring some parts of the country that I hadn’t visited before. Then it was time to start working again on a project that I wanted to work on. This involved collecting the political affiliation history of the MPs (members of parliament) of the country. I think it is valuable for the citizens to know that candidates of each political party were not always part of that party. It could be even more valuable to know a few outliers who change their affiliations often.

Collecting all that data is incredibly hard. In fact, I couldn’t even find a list of current MPs and their respective constituencies as a CSV file. I looked up many government websites, websites of so called “data agencies”, journalism websites and much more. At this point, I manually entered each row in the excel sheet from the wikipedia list. Then googling each name on the list for their history and adding it to the excel file. I was doing all of that and losing motivation to do it further. And finally, I gave up.

Along with the previous data collection process, I also started looking for other data sets that I thought might be interesting. For instance, unemployment is a major issue in the upcoming election, I tried to look for data sets for that. I did find OECD and world bank data for the topic, but there are multiple organisations that publish reports cited by news channels e.g. NASSCOM and CII. I wanted to include them as well.

All of the data released by the Indian government, NSSO, CII and similar agencies is hidden deep inside reports published online as PDFs. One can choose to read through 100 – 500 pages of each of them (on another note: they are also herculean-ly difficult to find on the websites mentioned). This made me realise how the offices operate in India and how the bureaucratic process like getting pension benefits takes so long.

India might have come a long way in terms of standards of living of people. In fact, the standard of living of a person with a white collar job in the cities is virtually identical to the one living in the so-called “first world”. But my observation of how the government works makes me realise how far behind we are. If the government, people, academics, journalists and researchers don’t have access to readily available data about pressing questions, how can we imagine ourselves to progress? It is true that the world of private companies is much better in any country; government offices will probably always have some level of formality associated to them, but if 2016 data is published in 2017 which takes another year to understand for the general public something is very wrong with our democracy.

It will not be fair to leave without mentioning the respect I have for the people who study government data in India.