Industrial profile and growth in Karnataka (1880-2015)

I explored into the data from the Registrar of Companies of the state of Karnataka (KA), India and explored into the data. The data used here is from the Open Government Data Platform of India. The KA state has company registration data from 1880 till date and this data provides a good view into the various trends/patterns in the industrial profile of KA as well as a peek at the industrial growth trends in KA. I included my Jupyter notebook (Python 3 kernel) here too.

To begin with, here is a little bit of boiler plate python code to begin with.

In [1]:
from datetime import timedelta, datetime
import pandas as pd
from random import randint
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import pylab
import matplotlib.cm as cm
import matplotlib.ticker as tkr
import re
import math

# Show the plots inline
%matplotlib inline 
plt.style.use('ggplot')

#Avoid scientific notation in charts
pd.set_option('display.float_format', lambda x: '%.2f' % x)

As often, the data available at Open Government Data Platform of India requires a bit of data sanitizing and preparation.

In [2]:
df = pd.read_csv("company_master_data_KA.csv",encoding='ISO-8859-1',sep='","')

df = df.rename(columns={"\"CORPORATE_IDENTIFICATION_NUMBER":"CORPORATE_IDENTIFICATION_NUMBER","SUB_CATEGORY\"":"SUB_CATEGORY"})
df['CORPORATE_IDENTIFICATION_NUMBER'] = df.CORPORATE_IDENTIFICATION_NUMBER.apply(lambda x: str(x).replace('"',''))
df['SUB_CATEGORY'] = df.SUB_CATEGORY.apply(lambda x: str(x).replace('"',''))
In [3]:
#Sanitize, prepare data
def activity_subclass(x):
    if str(x).startswith('Manufacturing'):
        return re.search(r'\((.*)\)',x).group(1)
    else:
        return ""

#registration date/year
df["regdate"] = pd.to_datetime(df["DATE_OF_REGISTRATION"])
df['year'] = df['regdate'].apply(lambda x: int(x.year))

#principal business activity
df["business_activity"] = df['PRINCIPAL_BUSINESS_ACTIVITY'].apply(lambda x: "Manufacturing" if str(x).startswith('Manufacturing') else str(x))

#authorized capital and paidup/issued captial
df["activity_sub"] = df['PRINCIPAL_BUSINESS_ACTIVITY'].apply(lambda x: activity_subclass(x))
df["PAIDUP_CAPITAL"] = df["PAIDUP_CAPITAL"].apply(lambda x: float(str(x).replace(',','')))
In [4]:
fig = plt.figure()
ax = fig.add_subplot(111)
#Partial data available for 2015
grouped = df[df.year<2015].groupby('year')['CORPORATE_IDENTIFICATION_NUMBER'].agg('count')
grouped.plot(figsize=(12,6),color='#4B0082')
plt.title('Companies registered in KA by year (1880-2014)',fontsize='12')
plt.xlabel('Year of registration',fontsize='8')
plt.ylabel('No. of companies',fontsize='8')

ax.annotate('Indian Independence (1947)',xy=(1947,200),xytext=(1930,1000),arrowprops=dict(facecolor='black'))
ax.annotate('Dotcom bubble (2000)',xy=(2000,2300),xytext=(1975,3500),arrowprops=dict(facecolor='black'))
ax.annotate('Financial crisis of 2007-2008',xy=(2008,4000),xytext=(1975,5000),arrowprops=dict(facecolor='black'))
ax.annotate('Economic liberalisation of India (1991)',xy=(1991,1200),xytext=(1950,2400),arrowprops=dict(facecolor='black'))
ax.annotate('Kargil war\n(1999)',xy=(1999,1200),xytext=(2000,300),arrowprops=dict(facecolor='black'))
Out[4]:

This is a very interesting chart that shows the number of companies registered with the KA government Registrar of Companies and the trends that emerge correlating with various world events such as economic crises, wars, etc. It also is quite evident that there was barely any industrial growth except for the last quarter of the 20th century. The lack of industrial growth and opportunities under the British occupation and early decades of independence before economic liberalization is staggering.

In [5]:
#filter to last 4 decades
df = df[(df['year']>=1980)&(df['year']<2015)&(df['business_activity']!='nan')]
In [6]:
grouped = df.groupby('business_activity')['CORPORATE_IDENTIFICATION_NUMBER'].agg('count')

grouped.sort_values(ascending=False).plot(kind='barh',color='#9370DB')
plt.title('Companies registered in KA by principal business activity (1980-2014)',fontsize='10')
plt.xlabel('No. of companies',fontsize='8')
plt.ylabel('')
Out[6]:

This is the breakdown of the number of companies aggregated by the principal business activity of the registered company.

In [7]:
grouped = df.groupby('business_activity')
inactivedf = df[(~df.COMPANY_STATUS.isin(['ACTIVE','ACTIVE IN PROGRESS', 'AMALGAMATED', 'CONVERTED TO LLP']))]
ndf = pd.DataFrame(columns=('business_activity','total_companies','inactive'))
idx = 0

for i,group in grouped:
    total = len(df[df.business_activity == i])
    inactive = len(inactivedf[inactivedf.business_activity == i])
    ndf.loc[idx] = [i,total,inactive]
    idx = idx+1

ndf['failure_rate'] = ndf['inactive']/ndf['total_companies']*100
ndf.sort_values(by='failure_rate',ascending=False).plot(y='failure_rate',x='business_activity',label='Failure Rate (%)',kind='barh',stacked=False,figsize=(12,6))

plt.title('Company failure rates by principal business activity (1980 - 2014)',fontsize='12')
plt.xlabel('Failure Rate (%)',fontsize='10')
plt.ylabel('')
Out[7]:

Based on the number of companies that are currently Dissolved, Dormant, Liquidated, Struck Off, In process of dissolution or striking off etc. I computed the rate of failure of companies by business activity. It shows that sectors such as agriculture, finance and mining/quarrying are quite risky ventures compared to construction and business services which have failure rates as low as 20%.

The dependence of monsoons for irrigation is a huge risk for agriculture and allied activities. The massive corruption when it comes to mining perhaps is a cause for the high risk of failure for starting up.

In [8]:
#Filter out inactive companies (Dissolved, Dormant, Liqudated, Struck off etc.)
df = df[df.COMPANY_STATUS.isin(['ACTIVE','ACTIVE IN PROGRESS', 'AMALGAMATED', 'CONVERTED TO LLP']) ]
In [9]:
groups = df.groupby(['business_activity'])
colors = iter(cm.rainbow(np.linspace(0,1,len(groups))))

plt.figure()

for i,g in groups:
    a = g.groupby('year')['CORPORATE_IDENTIFICATION_NUMBER'].agg('count')
    c=next(colors)
    plt.plot(a,label=i,c=c)
    
plt.legend(bbox_to_anchor=(1.05,1),loc=2,fontsize="x-small")
plt.title('Principal Business Activity of Companies (1980-2014)',fontsize='10')
plt.xlabel('Year of registration',fontsize='8')
plt.ylabel('No. of companies',fontsize='8')
Out[9]:

As evident here the business and IT services industries have taken off at a very healthy rate in KA in the past 3 decades. There is also a visible spike in 1995 in Community, personal and social services. These include massive undertakings by the GoK such as Karnataka Neeravari Nigama Ltd.

In [10]:
grouped = df.groupby('business_activity')
ndf = pd.DataFrame(columns=('business_activity','growth_rate'))
idx = 0

for i,group in grouped: 
    total = []
    old = len(df[(df.business_activity == i) & (df.year == 1980)])
    new = 0 
    for year in range(1980, 2014):
        new = new + len(df[(df.business_activity == i) & (df.year == year)])
    cagr = (math.pow((new-old),(1/(2014-1980)))-1)*100   
    ndf.loc[idx] = [i,cagr]
    idx = idx+1
    
ndf.sort_values(by='growth_rate',ascending=False).plot(y='growth_rate',color='#3c643c',label='Growth Rate',x='business_activity',kind='barh',stacked=False,figsize=(12,6))

plt.title('Company growth rates (CAGR) by principal business activity (1980 - 2014)\n',fontsize='12')
plt.xlabel('Growth Rate (%)',fontsize='10')
plt.ylabel('')
Out[10]:

I used Compounded Annual Growth Rates (CAGR) to compute the growth rates by each sector. It shows low growth rates of sectors such as Insurance which could be due to heavy regulation and monopolies in the market.

In [11]:
groups = df[df['business_activity']=='Manufacturing'].groupby(['activity_sub'])
colors = iter(cm.rainbow(np.linspace(0,1,len(groups))))

plt.figure()

for i,g in groups:
    a = g.groupby('year')['CORPORATE_IDENTIFICATION_NUMBER'].agg('count')
    c=next(colors)
    plt.plot(a,label=i,c=c)
    
plt.legend(bbox_to_anchor=(1.05,1),loc=2,fontsize="x-small")
plt.title('Manufacturing Industries',fontsize='12')
plt.xlabel('Year of registration',fontsize='10')
plt.ylabel('No. of companies',fontsize='10')
Out[11]:

A deeper dive into the various sub categories of Manufacturing industries. One stand out feature is how the Financial crisis of 2007-2008 affected the Manufacturing industries, specially machinery and metals and chemicals.

In [12]:
fig, ax = plt.subplots()
groups = df[(df.SUB_CATEGORY == 'Indian Non-Government Company')].groupby(['COMPANY_CLASS'])
#colors = iter(cm.rainbow(np.linspace(1,0,10)))

for i,g in groups:
    if True:
        #c=next(colors)
        a = g.groupby('year')['CORPORATE_IDENTIFICATION_NUMBER'].agg('count')
        plt.plot(a,label=i)
        #a.plot(kind='bar',stacked=False)
plt.legend(bbox_to_anchor=(1.05,1),loc=2,fontsize="x-small")
plt.axvspan(2013, 2015, facecolor='b', alpha=0.15)
plt.axvspan(2007, 2013, facecolor='#ffa500', alpha=0.35)
plt.axvspan(2006, 2007, facecolor='g', alpha=0.15)
plt.axvspan(1999, 2006, facecolor='b', alpha=0.15)
plt.axvspan(1994, 1999, facecolor='g', alpha=0.15)
plt.axvspan(1989, 1994, facecolor='b', alpha=0.15)
plt.axvspan(1983, 1989, facecolor='#000080', alpha=0.35)
plt.axvspan(1980, 1983, facecolor='b', alpha=0.15)
ax.ticklabel_format(style='plain')
ax.yaxis.set_major_formatter(tkr.FuncFormatter(lambda x,p: format(int(x),',')))
plt.title('Indian Non-Government Companies (1980-2015)\n',fontsize='10')
plt.xlabel('Year of registration',fontsize='8')
plt.ylabel('No. of companies',fontsize='8')

props = dict(alpha=0.05,facecolor='#ffffff')
ax.text(1.05,0.3,'State government\n(background color)\n---------------------------------------',transform=ax.transAxes,verticalalignment='bottom',bbox=props,fontsize=8)
ax.text(1.05,0.25,'Bharatiya Janata Party (BJP)',transform=ax.transAxes,verticalalignment='bottom',bbox=props,fontsize=8,color='#ffa500')
ax.text(1.05,0.2,'Indian National Congress (INC)',transform=ax.transAxes,verticalalignment='bottom',bbox=props,fontsize=8,color='b')
ax.text(1.05,0.15,'Janata Dal',transform=ax.transAxes,verticalalignment='bottom',bbox=props,fontsize=8,color='g')
ax.text(1.05,0.1,'Janata Party',transform=ax.transAxes,verticalalignment='bottom',bbox=props,fontsize=8,color='#000080')
Out[12]:

To get a sense of how the Government of KA has facilitated ease of doing business for non-government companies that are both publicly and privately owned, here I can see the sudden increase in the growth of private businesses starting in early 2000s, most notably the government of S.M. Krishna (Indian National Congress), and the trend continued with the subsequent Bharatiya Janata Party (BJP) government, with setbacks due to the Financial crisis of 2007-2008.

Insights:

  • The slope of the growth in number of companies after the economic liberalization of India makes the effect of free markets for growth evident.
  • Effect of war on industrial growth is evident by the drop in number of companies during the Kargil war (1999)
  • Industry in KA in Business services statistically has the highest growth rate and is least likely to fail. The failure rate of a company in Business services is about 50% lower than sectors like agriculture or finance.
  • Manufacturing & equipment, metals and chemicals businesses grew the most rapidly in the past decade but these were the sectors in Manufacturing based industries that were hit the most by the financial crisis of 2007-2008.
  • The previous 2 governments (INC and BJP) have been conducive for business growth. This trend began most notably after S. M. Krishna was elected the chief minister of Karnataka.
  • Agriculture and allied industries are statistically more likely to fail compared to other sectors.

Leave a Reply