Categories
Articles General

The mission of PGDBA

By Anurag Malakar, PGDBA Batch-5 (2019-21)

The year 2005 | A regular morning in a middle-class Indian’s life

The middle-aged mom heaved a sigh of relief, having just sent off her kids on the school bus. It’s been a busy morning for her; she had woken up early to cook food for her husband and the two darling-turned-devils of her eyes. She returns upstairs and spends the rest of the morning preparing lunch for her kids who’ll soon be home, arguing with her maid, and reminding the security guard for the umpteenth time to call a plumber to repair the leaky faucet in her kitchen. She watches a repeat of her favorite TV show on Star Plus for a while before going to the market and buying fresh vegetables and bread for the next day.

The year 2019 | A technology-enabled morning in an aspirational Indian’s life

The father sits down on the sofa in the living room, exhausted after an early morning run. He checks his Fitbit and grins as the tiny screen highlights that his mean speed was 25% higher than last month’s average. After freshening up, he wakes up the kids and walks over to the living room. While pouring milk and cereal into three bowls, he observes that the pack of cereals is almost empty. His kids enter the living room and call out to Alexa to play their customized playlist of morning songs, and she dutifully obliges. He whips out his phone, opens WhatsApp and video calls his better half who’s away on a business meeting. The happy family reunion is interrupted by the sound of the doorbell, which suggests that Amazon will timely deliver the new toy the younger kid had been demanding for the past week. The dad checks the time, gets dressed quickly, and books a cab while tying his shoes. He drops his kids off to school and orders their favorite pack of cereals on Grofers. He calculates that he has just about enough time to finish the Netflix special that he had dozed off while watching last night. The app’s proprietary algorithm that incessantly analyzes his viewing habit dutifully suggests a new range of TV Shows for him to binge-watch later.

The world is generating a diverse range of data like never before. Over the last couple of years, 90% of the total data in the world has been generated. More than 3.7 billion people use the Internet today, and that number is only increasing. Google is processing more than forty thousand searches every second, while 41000 new pictures are being uploaded on Instagram every minute. By the time you’re done reading this paragraph, Uber riders would’ve taken more than forty-five thousand trips!

In this scenario, businesses have an unprecedented opportunity to capitalize on the vast amount of information they are gathering on their current as well as prospective customers. Social Media is analyzing the behavioral patterns of individual users and predicting their affinities with great accuracy. The entire world is moving towards a connected networking platform with the advent of the Internet of Things and Machine Learning techniques. Data has become the new oil, and Data Scientists and Analysts are expected to monetize this oil to the greatest extent possible. But this burgeoning sector is experiencing a shortage of an efficient workforce. The lack of data scientists has become a major constraint, and industries are clamoring for individuals who can make sense of the data they are amassing and provide tangible business advantages. There are a widening demand and supply gap in the market, and five years ago, IIM Calcutta, in conjunction with ISI Kolkata and IIT Kharagpur, had sought to capitalize on the mismatch.

IIM CALCUTTA

That’s how PGDBA was conceptualized, in accordance to the industry demands while paying due heed to their suggestions. As the fourth batch prepares to venture beyond the gates of Joka, the last one and a half years of training imparted at these stellar institutions are going to be employed. The industry has steep expectations of this bunch of talented individuals. They’re expected to draw meaningful insights out of data that benefits their businesses. From risk management to healthcare analytics, to behavioral analytics and digital marketing; the needs that PGDBA graduates are expected to cater to are diverse.

As the world keeps amassing higher amounts of data, it’ll become increasingly difficult to make sense out of it. It’ll be akin to finding the precisely manufactured red coloured needle out of multiple piles of slightly deformed red coloured needles; surgical precision is critical to ensure it does not become a bloodbath. That’s where this course comes in, perfectly positioned to meet the business requirements by capitalizing on the technological advancements of the 21st century.

Categories
Articles Technical

A Record, a Code and Twitter

How my first machine learning model was validated by a goal scored a continent away.

   Ritwik Moghe

It was the twenty-eighth day of November 2015. As the strangely balmy day yawned, stretched and gratefully gave way to dusk, several eyes were glued to the actions of one man. The man was slight, had strange spiky hair and a face that might remind many of all those ‘dawgs’ or ‘dealers’ from Breaking Bad. Only a year back, hardly anyone knew of his existence. And today, he was about to etch his name in the annals of footballing history.

As he latched on to a pitch perfect through-ball that split the Manchester United defense in half and slotted it in past the oncoming goalkeeper, several things exploded. One of those was the voice of the legendary Martin Tyler as he shouted “Vardy! Its Eleven, it’s Heaven for Jamie Vardy” (The goal as it unfolded). Jamie Vardy, a name most of you might still be unfamiliar with, had broken the English Premier League Record of scoring goals in most number of consecutive matches. He had scored in each of the past eleven games. In the grand scheme of things, the record, in itself might not be much significance. What mattered more was Vardy’s story. From an amateur player with no ‘proper’ training or facilities and very humble beginnings, he had risen to be the most prolific striker in one of the most competitive leagues around the world. It was a classic fairy-tale. For several amateurs trudging every evening into those muddy football fields and trying to curl it like Carlos, Vardy was hope.

So as he was being engulfed by his team-mates after he had scored that crucial record-breaking goal, Vardy was causing another explosion around the world. It was an explosion of hope, of greetings and of admiration. And sitting in our dorm rooms overlooking the ponderous Barrackpore Trunk Road in the quaint campus of ISI Calcutta, a bunch of us fledgling data-scientists of PGDBA captured this joyous explosion. We captured it using Twitter.

Messi_Vardy A graph encapsulating the positive Twitter sentiment about Vardy right after THE GOAL!

The problem that we were working on was Opinion Mining through tweets. Billions of tweets are posted every day. These tweets reflect the opinions or sentiments of the users about various topics. For instance, a tweet like “I love Apple #Iphone6” might reflect the user’s positive sentiment about the company Apple. A study of several such tweets about a particular subject or company can provide valuable insights to the company about the general public opinion about themselves.

We were analyzing the Twitter sentiment about various current and upcoming football stars. Our aim was to identify the next big star, the one who would eclipse Messi and attain the ultimate pinnacle of fame by someday being the brand ambassador of Tata Motors! Our observation about Vardy and his big day was a mere microcosm of a bigger project where we analyzed over one million tweets about 10 players obtained over a period of one month.

The analysis began with mining tweets about the particular players. The tweets were obtained from an API using Python. Relevant meta-data like the location of the user and the time-stamp of the tweet was extracted along with the text of the tweet.

Data_Extraction

The text of the tweet was then converted into a Term-Document Frequency Matrix (TDF). Now only a year ago, all that I could I have thought of on hearing ‘Term-Document Frequency Matrix’ would have been Neo in his slick glasses staring into some green numbers floating Chinese-style on an antique nineties monitor! But TDF is way simpler than that. All it does is that it creates a table. Each row is a tweet. All the words observed in all the tweets that we are studying make up the columns.

Consider this example for clarity-

TDF

Thus each word is now a feature and each tweet a data-point. We then used a Machine Learning technique called Maximum Entropy Classifier in R to classify each tweet or data-point into one of the three categories: Positive, Neutral or Negative. (I could get into the details of the work, about why we went for supervised classification approach, about why MaxEnt works best for Text Classification etc. But since I’m trying to make this article tractable for someone with no prior analytics experience I stop by providing a link to another blog about our detailed work. (A detailed report of our project)

Now this process was carried out for all the tweets about all the different players. The prevalent sentiment about a particular player was given by the difference between the number of positive and negative tweets (which was also normalized). Doing this helped us observe several interesting trends in the data. Consider the comparative study of sentiments about Neymar, Ronaldo and Harry Kane over November 2015. Also, have a look at how the sentiment about Harry Kane varied across countries.

Ronaldo-Kane

Kane-Sentiment

Such analysis has huge potential applications. Imagine how Tottenham Hotspurs, the club which Harry Kane plays for could maximize their profits by opening more ‘Spurs Stores’ in South Africa where Kane is way more popular (green) as compared to say Australia where he is clearly notorious (red). Are you an executive at EA Sports and want to decide whom to have on the cover of FIFA 16? Just mine sentiment on twitter and viola, you’ll see that Neymar would be a way better choice than Kane.

So this was all about our project on Twitter sentiment about football superstars. This project was a part of our course called Computing for Data Sciences at ISI Kolkata. All of our fellow mates from PGDBA have also worked on several such (hopefully: P) interesting projects. Some of them will share their stories with you on this blog as well.

Vardy and his record holds a special place in our hearts. He was the perfect muse for demonstrating the effectiveness of our model. When you’ve come up with your first ‘Real’ model, the true test of the model happens when you see it work in real life on a completely unexpected scale. That meteoric rise in Vardy’s sentiment at 5.55 pm BST, right after he had scored the crucial goal proved to us beyond doubt that our model worked! So, I sign off with a link to that moment when Vardy smashed a record, the moment when people around the world celebrated the dawn of a new star, and the moment when our model was validated! Cheers!

 

Categories
Experiences

The ISI Chapter

The first semester started on July 20, 2015. The classes were held initially at the Kolmogorov bhavan. Within a month we were given our own classroom in the Satyendra Nath Bose Bhavan.

Kolmogorov

Kolmogorov bhavan, ISI

Our curriculum consisted of 5 subjects. The subjects and the professors involved were:

Statistical inference – Amitava Banerjee

Stochastic processes – Dr. Bimal Roy & Dr. Kishan Chand Gupta

Computing for data sciences – Dr. Sourav Sengupta

Statistical structures in data – Debashish Sengupta

Database Management Systems – Dr. Pinakpani Pal & Amiya Das

By far the stars of the course were the faculty members. It was an honour to interact with a Padma Shree awardee in Dr. Bimal Roy. To be taught on a regular basis by such an esteemed personality was slightly overwhelming and hugely enriching. The sheer brilliance of the man and his way of looking at probability and its applications was an experience difficult to pen down. Dr. Kishan Chand Gupta shared the course and taught Markov Chains.

SN Bhawan

 Satyendra Nath bhavan , ISI Campus

Diligent and sincere, Prof. Debashish Sengupta was the ideal teacher. He covered every topic rigorously starting right with the basics of statistics to the highly complex multivariate analysis. What seemed an easy course initially, became heavily loaded and among the toughest by the time the course came to its completion. Tutorials were held every week to discuss exercise problems.

The jovial Prof. Amitava Banerjee taught us the habit of drawing meaningful inferences out of large volumes of data. Drawing from his vast pool of consultancy experience, he inculcated in us the ability to convert real life business problems into statistical problems. His assignments involved working on datasets and testing hypothesis in the correct way.

Dr. Pinakpani Pal was interactive, and worked hard to ensure that our stay was a comfortable one. His course had 2 parts: the theoretical knowledge of databases, and a hands-on SQL application. He shared the course with Amiya Das, a seasoned professional at Oracle.

The friendly and ever enthusiastic Dr Sourav Sengupta was always approachable and motivated the entire batch in getting accustomed with highly complex ideas. His passion for teaching shone through as he went through the concepts of linear algebra and machine learning algorithms. He organised the course superbly and the web page for his course was among the best resource repositories we could have hoped for.

The invited lectures were top drawer, with experienced professionals coming in to share their insights and recommendations about the field of analytics. Overall the first semester was a learning experience beyond compare and laid a solid foundation on which we can build in our journey towards becoming well-rounded data scientists.

Deshmukh

   Our hostel, Deshmukh Bhavan

Categories
About Us

What is PGDBA?

PGDBA stands for Post Graduate Diploma in Business Analytics (PGDBA website), probably the first two year full time course in India, jointly offered by Indian Institute of Management, Calcutta; Indian Statistical Institute, Kolkata and Indian Institute of Technology, Kharagpur. PGDBA has been started with the philosophy that data is the new oil in this century. With an abundance of data, driving a business successfully and effectively is becoming a tricky aspect. Recent surveys suggest that big data could create $300 billion in value in healthcare alone each year; clever use of location data across industries could capture $600 billion in consumer surplus. Conversely, poor data management can cost up to 35% of a business’s operating revenue. While the possibility and ability to capture and store the ocean of data has grown up to overwhelming levels, but the right use of techniques to extract ‘information’ from these data sets has not been keeping pace with the demands of the industry, and there continues to be a worrying skill shortage across all sectors. More specifically saying, crunching data to generate necessary business insights requires a strong hold on Statistics, Technology and Business simultaneously. The requirement is so rare that the industry hardly sees individuals having the amalgamation of all of these three crucial skills in the domain of Business Analytics.

To cater to this need, PGDBA has been built based on three pillars: Math and Statistics, Technology and Business, as clear from the expertise of the parenting institutes. This course offers four semester with an introductory pre-semester. It is true that two years is not adequate enough to generate data scientists and one can hardly scratch the surface of Machine learning and Data mining, but the unique combination of this course gives it a distinct identity which unravels endless opportunities to the participants: Financial analysis, Consultancy, PhD in Machine Learning, R&D, entrepreneurship to build data-driven startups…you name it!