Categories
Articles General

The mission of PGDBA

By Anurag Malakar, PGDBA Batch-5 (2019-21)

The year 2005 | A regular morning in a middle-class Indian’s life

The middle-aged mom heaved a sigh of relief, having just sent off her kids on the school bus. It’s been a busy morning for her; she had woken up early to cook food for her husband and the two darling-turned-devils of her eyes. She returns upstairs and spends the rest of the morning preparing lunch for her kids who’ll soon be home, arguing with her maid, and reminding the security guard for the umpteenth time to call a plumber to repair the leaky faucet in her kitchen. She watches a repeat of her favorite TV show on Star Plus for a while before going to the market and buying fresh vegetables and bread for the next day.

The year 2019 | A technology-enabled morning in an aspirational Indian’s life

The father sits down on the sofa in the living room, exhausted after an early morning run. He checks his Fitbit and grins as the tiny screen highlights that his mean speed was 25% higher than last month’s average. After freshening up, he wakes up the kids and walks over to the living room. While pouring milk and cereal into three bowls, he observes that the pack of cereals is almost empty. His kids enter the living room and call out to Alexa to play their customized playlist of morning songs, and she dutifully obliges. He whips out his phone, opens WhatsApp and video calls his better half who’s away on a business meeting. The happy family reunion is interrupted by the sound of the doorbell, which suggests that Amazon will timely deliver the new toy the younger kid had been demanding for the past week. The dad checks the time, gets dressed quickly, and books a cab while tying his shoes. He drops his kids off to school and orders their favorite pack of cereals on Grofers. He calculates that he has just about enough time to finish the Netflix special that he had dozed off while watching last night. The app’s proprietary algorithm that incessantly analyzes his viewing habit dutifully suggests a new range of TV Shows for him to binge-watch later.

The world is generating a diverse range of data like never before. Over the last couple of years, 90% of the total data in the world has been generated. More than 3.7 billion people use the Internet today, and that number is only increasing. Google is processing more than forty thousand searches every second, while 41000 new pictures are being uploaded on Instagram every minute. By the time you’re done reading this paragraph, Uber riders would’ve taken more than forty-five thousand trips!

In this scenario, businesses have an unprecedented opportunity to capitalize on the vast amount of information they are gathering on their current as well as prospective customers. Social Media is analyzing the behavioral patterns of individual users and predicting their affinities with great accuracy. The entire world is moving towards a connected networking platform with the advent of the Internet of Things and Machine Learning techniques. Data has become the new oil, and Data Scientists and Analysts are expected to monetize this oil to the greatest extent possible. But this burgeoning sector is experiencing a shortage of an efficient workforce. The lack of data scientists has become a major constraint, and industries are clamoring for individuals who can make sense of the data they are amassing and provide tangible business advantages. There are a widening demand and supply gap in the market, and five years ago, IIM Calcutta, in conjunction with ISI Kolkata and IIT Kharagpur, had sought to capitalize on the mismatch.

IIM CALCUTTA

That’s how PGDBA was conceptualized, in accordance to the industry demands while paying due heed to their suggestions. As the fourth batch prepares to venture beyond the gates of Joka, the last one and a half years of training imparted at these stellar institutions are going to be employed. The industry has steep expectations of this bunch of talented individuals. They’re expected to draw meaningful insights out of data that benefits their businesses. From risk management to healthcare analytics, to behavioral analytics and digital marketing; the needs that PGDBA graduates are expected to cater to are diverse.

As the world keeps amassing higher amounts of data, it’ll become increasingly difficult to make sense out of it. It’ll be akin to finding the precisely manufactured red coloured needle out of multiple piles of slightly deformed red coloured needles; surgical precision is critical to ensure it does not become a bloodbath. That’s where this course comes in, perfectly positioned to meet the business requirements by capitalizing on the technological advancements of the 21st century.

Categories
Articles Technical

Analytics in Healthcare – Xerox Challenge

A brief overview on how I approached a real world healthcare problem via Analytics.

Robin Singh

Disclaimer : I have tried to restrict technicalities to a minimum in the blog, so as to cater to a wider segment of readers. However, a little awareness about machine learning will make the rest of the post even more comprehensible and hopefully exciting.

The healthcare industry is not only  huge but also has a tremendous potential for the use of technology and data science. In this post, I will share one of the numerous instances of the use of unconventional analytics to engineer solutions in response to challenges in the field of healthcare. This seemingly complex idea, can be structured and implemented by a combination of machine learning tools and data crunching techniques.

The solution proposed here is designed to work with critical patient data in hospitals and raise an alarm when the state of the patient degrades, eventually leading to potentially fatal outcomes. Now the obvious questions, How will such a system help? Can the doctors not monitor patient state physically? Well, it is only possible for a doctor to physically monitor a small number of patients. What if the number of patients is large?. Also, the decision to provide intensive care to patient after an alarm has been raised has monetary and human life impacts. If the alarm can be raised in time, intensive care although expensive, can be provided to the patient.

At the backend, the model can be seen as a typical classification machine learning problem . Classification, as the name suggests is a method to categorize data points into predetermined target groups. Numerous algorithms can do classification like Bayesian model, decision tree, random forest, regression etc. We used Random Forest model for the current data set, due to simplicity and ease of implementation.

However, the classification method happened to be only the tip of the iceberg. There were many unforeseen challenges –  primarily due to data coming from the healthcare domain and also computational resource constraints. First, healthcare data is highly erratic and the severity of a measurement varies from person to person. For example,  a certain value of a respiratory measurement can be dangerous and life risking for a normal person but normal for a smoker. This poses a fundamental challenge to the accuracy of models built on the healthcare data. Second, the  state to be predicted is different from the state whose training data is available. This is slightly difficult to grasp, but lets try. We want to raise an alarm when the patient’s situation is worsening from normal and approaching mortality but still the patient has time. However, the training dataset has information on the actual mortality/no-mortality. Using the training data to learn will imply making an approximation. The third challenge comes from implementation aspects. The prediction of no-mortality should be highly reliable as compared to prediction of mortality. The system should be able to predict the no-mortality situations with an accuracy of 99% or above. Accuracy in no-mortality and mortality have a trade-off and hence if we tune the model for high accuracy in no-mortality then the accuracy on mortality is low.

Let’s take a moment to think about the methodology again. What can we observe? The predictive model seems to be replicating logic similar to a real doctor. In fact, the very idea of machine learning is to train the machines to apply logic like human beings do. For example, using the past data to learn and take decisions in the future cases, considering trade-offs originating from the decision making process and using the concept of information value to take  decision.

Discussed above is one example on the use of analytics and artificial intelligence in the healthcare scenario. There are many unexplored applications in the domain, a huge scope for improvement in the existing models and unquantifiable amount of data to process. In the coming years devices based on such models will be a reality and the industry requires many more analyst to cater to the demand.

Categories
Articles Technical

A Record, a Code and Twitter

How my first machine learning model was validated by a goal scored a continent away.

   Ritwik Moghe

It was the twenty-eighth day of November 2015. As the strangely balmy day yawned, stretched and gratefully gave way to dusk, several eyes were glued to the actions of one man. The man was slight, had strange spiky hair and a face that might remind many of all those ‘dawgs’ or ‘dealers’ from Breaking Bad. Only a year back, hardly anyone knew of his existence. And today, he was about to etch his name in the annals of footballing history.

As he latched on to a pitch perfect through-ball that split the Manchester United defense in half and slotted it in past the oncoming goalkeeper, several things exploded. One of those was the voice of the legendary Martin Tyler as he shouted “Vardy! Its Eleven, it’s Heaven for Jamie Vardy” (The goal as it unfolded). Jamie Vardy, a name most of you might still be unfamiliar with, had broken the English Premier League Record of scoring goals in most number of consecutive matches. He had scored in each of the past eleven games. In the grand scheme of things, the record, in itself might not be much significance. What mattered more was Vardy’s story. From an amateur player with no ‘proper’ training or facilities and very humble beginnings, he had risen to be the most prolific striker in one of the most competitive leagues around the world. It was a classic fairy-tale. For several amateurs trudging every evening into those muddy football fields and trying to curl it like Carlos, Vardy was hope.

So as he was being engulfed by his team-mates after he had scored that crucial record-breaking goal, Vardy was causing another explosion around the world. It was an explosion of hope, of greetings and of admiration. And sitting in our dorm rooms overlooking the ponderous Barrackpore Trunk Road in the quaint campus of ISI Calcutta, a bunch of us fledgling data-scientists of PGDBA captured this joyous explosion. We captured it using Twitter.

Messi_Vardy A graph encapsulating the positive Twitter sentiment about Vardy right after THE GOAL!

The problem that we were working on was Opinion Mining through tweets. Billions of tweets are posted every day. These tweets reflect the opinions or sentiments of the users about various topics. For instance, a tweet like “I love Apple #Iphone6” might reflect the user’s positive sentiment about the company Apple. A study of several such tweets about a particular subject or company can provide valuable insights to the company about the general public opinion about themselves.

We were analyzing the Twitter sentiment about various current and upcoming football stars. Our aim was to identify the next big star, the one who would eclipse Messi and attain the ultimate pinnacle of fame by someday being the brand ambassador of Tata Motors! Our observation about Vardy and his big day was a mere microcosm of a bigger project where we analyzed over one million tweets about 10 players obtained over a period of one month.

The analysis began with mining tweets about the particular players. The tweets were obtained from an API using Python. Relevant meta-data like the location of the user and the time-stamp of the tweet was extracted along with the text of the tweet.

Data_Extraction

The text of the tweet was then converted into a Term-Document Frequency Matrix (TDF). Now only a year ago, all that I could I have thought of on hearing ‘Term-Document Frequency Matrix’ would have been Neo in his slick glasses staring into some green numbers floating Chinese-style on an antique nineties monitor! But TDF is way simpler than that. All it does is that it creates a table. Each row is a tweet. All the words observed in all the tweets that we are studying make up the columns.

Consider this example for clarity-

TDF

Thus each word is now a feature and each tweet a data-point. We then used a Machine Learning technique called Maximum Entropy Classifier in R to classify each tweet or data-point into one of the three categories: Positive, Neutral or Negative. (I could get into the details of the work, about why we went for supervised classification approach, about why MaxEnt works best for Text Classification etc. But since I’m trying to make this article tractable for someone with no prior analytics experience I stop by providing a link to another blog about our detailed work. (A detailed report of our project)

Now this process was carried out for all the tweets about all the different players. The prevalent sentiment about a particular player was given by the difference between the number of positive and negative tweets (which was also normalized). Doing this helped us observe several interesting trends in the data. Consider the comparative study of sentiments about Neymar, Ronaldo and Harry Kane over November 2015. Also, have a look at how the sentiment about Harry Kane varied across countries.

Ronaldo-Kane

Kane-Sentiment

Such analysis has huge potential applications. Imagine how Tottenham Hotspurs, the club which Harry Kane plays for could maximize their profits by opening more ‘Spurs Stores’ in South Africa where Kane is way more popular (green) as compared to say Australia where he is clearly notorious (red). Are you an executive at EA Sports and want to decide whom to have on the cover of FIFA 16? Just mine sentiment on twitter and viola, you’ll see that Neymar would be a way better choice than Kane.

So this was all about our project on Twitter sentiment about football superstars. This project was a part of our course called Computing for Data Sciences at ISI Kolkata. All of our fellow mates from PGDBA have also worked on several such (hopefully: P) interesting projects. Some of them will share their stories with you on this blog as well.

Vardy and his record holds a special place in our hearts. He was the perfect muse for demonstrating the effectiveness of our model. When you’ve come up with your first ‘Real’ model, the true test of the model happens when you see it work in real life on a completely unexpected scale. That meteoric rise in Vardy’s sentiment at 5.55 pm BST, right after he had scored the crucial goal proved to us beyond doubt that our model worked! So, I sign off with a link to that moment when Vardy smashed a record, the moment when people around the world celebrated the dawn of a new star, and the moment when our model was validated! Cheers!

 

Categories
General

Prospects of PGDBA – The Million Dollar Question

One of the most common questions that we have come across in the past few days is – How would placements/internships be?

Well, to be very honest, even we don’t know it. We can just anticipate and hope that it turns out to be better than our expectations. I will share my experience so far, which makes me believe that placements and internships are going to be no less than PGDM (of IIM C) or any other Masters course at any of the three institutes. The companies who are expected to recruit PGDBA students are going to be same bunch of companies who recruit MBA students. There are lots of companies who hire MBAs for analytics role. Now, since PGDBA program aims at bridging this gap, I feel that packages offered would be similar to the ones offered to PGDM students.

So far, the companies that have interacted with us are Microsoft, SAS, SBI, Deloitte, TCS, IBM, Flipkart, Reliance, American Express, BPCL, Latent View and few other start-ups. (I might be missing out few names). All the companies mentioned here have shown interest in hiring students for internships. Moreover, as per my discussion with the Chairman of this program, there are few other companies which have shown keen interest in hiring students (not disclosing the names, but these are the biggest e-commerce, I-Banks and Consulting firms). It is expected that few of these companies might teach us few courses in the upcoming semesters.
Considering the uptake of analytics in companies and PGDBA being the only full-time residential program (of such stature), demand is going to be very high. So, the hopes are very high and I am quite confident that these are going to be met. 

P.S. – All the views expressed on this blog are made by students and have nothing to do with any faculty, college or any official involved in the program.

Categories
About Us

What is PGDBA?

PGDBA stands for Post Graduate Diploma in Business Analytics (PGDBA website), probably the first two year full time course in India, jointly offered by Indian Institute of Management, Calcutta; Indian Statistical Institute, Kolkata and Indian Institute of Technology, Kharagpur. PGDBA has been started with the philosophy that data is the new oil in this century. With an abundance of data, driving a business successfully and effectively is becoming a tricky aspect. Recent surveys suggest that big data could create $300 billion in value in healthcare alone each year; clever use of location data across industries could capture $600 billion in consumer surplus. Conversely, poor data management can cost up to 35% of a business’s operating revenue. While the possibility and ability to capture and store the ocean of data has grown up to overwhelming levels, but the right use of techniques to extract ‘information’ from these data sets has not been keeping pace with the demands of the industry, and there continues to be a worrying skill shortage across all sectors. More specifically saying, crunching data to generate necessary business insights requires a strong hold on Statistics, Technology and Business simultaneously. The requirement is so rare that the industry hardly sees individuals having the amalgamation of all of these three crucial skills in the domain of Business Analytics.

To cater to this need, PGDBA has been built based on three pillars: Math and Statistics, Technology and Business, as clear from the expertise of the parenting institutes. This course offers four semester with an introductory pre-semester. It is true that two years is not adequate enough to generate data scientists and one can hardly scratch the surface of Machine learning and Data mining, but the unique combination of this course gives it a distinct identity which unravels endless opportunities to the participants: Financial analysis, Consultancy, PhD in Machine Learning, R&D, entrepreneurship to build data-driven startups…you name it!