Categories
Articles General

The mission of PGDBA

By Anurag Malakar, PGDBA Batch-5 (2019-21)

The year 2005 | A regular morning in a middle-class Indian’s life

The middle-aged mom heaved a sigh of relief, having just sent off her kids on the school bus. It’s been a busy morning for her; she had woken up early to cook food for her husband and the two darling-turned-devils of her eyes. She returns upstairs and spends the rest of the morning preparing lunch for her kids who’ll soon be home, arguing with her maid, and reminding the security guard for the umpteenth time to call a plumber to repair the leaky faucet in her kitchen. She watches a repeat of her favorite TV show on Star Plus for a while before going to the market and buying fresh vegetables and bread for the next day.

The year 2019 | A technology-enabled morning in an aspirational Indian’s life

The father sits down on the sofa in the living room, exhausted after an early morning run. He checks his Fitbit and grins as the tiny screen highlights that his mean speed was 25% higher than last month’s average. After freshening up, he wakes up the kids and walks over to the living room. While pouring milk and cereal into three bowls, he observes that the pack of cereals is almost empty. His kids enter the living room and call out to Alexa to play their customized playlist of morning songs, and she dutifully obliges. He whips out his phone, opens WhatsApp and video calls his better half who’s away on a business meeting. The happy family reunion is interrupted by the sound of the doorbell, which suggests that Amazon will timely deliver the new toy the younger kid had been demanding for the past week. The dad checks the time, gets dressed quickly, and books a cab while tying his shoes. He drops his kids off to school and orders their favorite pack of cereals on Grofers. He calculates that he has just about enough time to finish the Netflix special that he had dozed off while watching last night. The app’s proprietary algorithm that incessantly analyzes his viewing habit dutifully suggests a new range of TV Shows for him to binge-watch later.

The world is generating a diverse range of data like never before. Over the last couple of years, 90% of the total data in the world has been generated. More than 3.7 billion people use the Internet today, and that number is only increasing. Google is processing more than forty thousand searches every second, while 41000 new pictures are being uploaded on Instagram every minute. By the time you’re done reading this paragraph, Uber riders would’ve taken more than forty-five thousand trips!

In this scenario, businesses have an unprecedented opportunity to capitalize on the vast amount of information they are gathering on their current as well as prospective customers. Social Media is analyzing the behavioral patterns of individual users and predicting their affinities with great accuracy. The entire world is moving towards a connected networking platform with the advent of the Internet of Things and Machine Learning techniques. Data has become the new oil, and Data Scientists and Analysts are expected to monetize this oil to the greatest extent possible. But this burgeoning sector is experiencing a shortage of an efficient workforce. The lack of data scientists has become a major constraint, and industries are clamoring for individuals who can make sense of the data they are amassing and provide tangible business advantages. There are a widening demand and supply gap in the market, and five years ago, IIM Calcutta, in conjunction with ISI Kolkata and IIT Kharagpur, had sought to capitalize on the mismatch.

IIM CALCUTTA

That’s how PGDBA was conceptualized, in accordance to the industry demands while paying due heed to their suggestions. As the fourth batch prepares to venture beyond the gates of Joka, the last one and a half years of training imparted at these stellar institutions are going to be employed. The industry has steep expectations of this bunch of talented individuals. They’re expected to draw meaningful insights out of data that benefits their businesses. From risk management to healthcare analytics, to behavioral analytics and digital marketing; the needs that PGDBA graduates are expected to cater to are diverse.

As the world keeps amassing higher amounts of data, it’ll become increasingly difficult to make sense out of it. It’ll be akin to finding the precisely manufactured red coloured needle out of multiple piles of slightly deformed red coloured needles; surgical precision is critical to ensure it does not become a bloodbath. That’s where this course comes in, perfectly positioned to meet the business requirements by capitalizing on the technological advancements of the 21st century.

Categories
Articles Chronicles General

FINTECH Hackathon by NPCI at IIMC

A sneak peak of Day 1 of the FINTECH Hackathon organized by National Payments Corporation of India at IIM Calcutta.

Categories
About Us Articles General

What is PGDBA?

“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.”  – Geoffrey Moore, Organizational Theorist & Author

Analytics: Need of the Hour

The amount of data in our world is exploding. As per a McKinsey report, big data may well become a new type of corporate asset that will cut across business units and function the  same way as a powerful brand does, representing a key basis for competition. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media and the Internet of Things will fuel exponential growth in data in the foreseeable future.

The rapid pace with which data is being generated across  various domains and fields would require people with deep analytical skills to provide insights, enabling companies to gain a competitive edge and offer a better consumer experience. The insights from the data are used for various purposes such as segmentation of customers for better targeting, prediction of churn rate, product recommendation, analysis of customer opinion from media posts, preventive maintenance, management of portfolio risk etc.

Untapped potential in India

In India, the analytics market is expected to double between 2013 and 2018, reaching a figure of US$ 2.3 billion by 2018, according to a report published by NASSCOM and Blueocean Market Intelligence. This will result in a shortage of about 200,000 data scientists in India over the next few years, according to sources in the Analytics Special Interest Group setup by NASSCOM. Hence, creation of trained industry-ready business analytics professionals is the need of the day.

Globally, the demand of data scientists is projected to exceed supply by more than 50% by 2018. To address the paucity of trained workforce in analytics domain, the onus is on the educational institutes to address the needs of the industry.

The three institutes IIM Calcutta, IIT Kharagpur and ISI Kolkata are aware of the demand for the well rounded analytics professionals. In order to understand the needs of the industry, a day long conclave was held in January 2015 in IIM Calcutta. Representatives of the leading companies such as KPMG, EXL, Deloitte, LatentView, TCS, Reliance Communication, Deutsche Bank, E&Y, SBI, IBM, Google, Microsoft, HSBC and Cognizant took part in the conclave and based on their inputs, ideas of a specialized course emerged, a course which would produce leaders of the future, skilled in the essential areas of business, statistics and computer science. Through this collaboration between industry and academia, PGDBA was born.

Introduction to PGDBA

PGDBA is a two year full time residential diploma programme aimed at creating business analytics professionals employable by leading Indian and foreign firms. This programme is designed for those who have an analytical mindset, are interested in tackling challenging business problems, and possess an inclination towards Mathematics. Some of the salient features of the programme include courses taught by reputed faculty members at the campuses of three globally renowned institutes, hands-on business analytics training at a related company, continuous interaction with industry leaders throughout the duration of the programme and the availability of placement opportunities at all the three institutes.

The three institutes complement each other to give students the most comprehensive learning of business analytics. The course finds the optimal balance between the theoretical concepts of data science and their applications to business problems. Students get to regularly interact with industry experts through invited talks and guest lectures, where experts talk about the latest trends and challenges in the industry. The course leverages the best of the three institutes in their respective fields of expertise, i.e., Statistics, Technology and Business.

In the first semester at ISI, students form a strong base of Mathematical and Statistical concepts. This is followed by a semester at IIT Kharagpur, where students work on data science projects and get to test their statistical concepts built at ISI. This  semester is meant to build students’ technical expertise. To complement the first two semesters, students spend their third semester at IIM Calcutta where they are exposed to business problems through case studies and industry interactions. To cap it all, the students get to implement their knowledge in the industry through a six months long internship as their fourth semester.

Placements

The students from the first two batches of PGDBA experienced highly successful placement processes. Companies from various industries such as BFSI, Consulting, Healthcare, Research, Retail, Gaming, Technology etc. participated and handpicked candidates suitable for both data-driven and business driven analyst roles. The placement processes witnessed the involvement of Fortune 500 companies such as American Express, Walmart Labs, UnitedHealth Group, JPMC, Amazon, Walmart Labs. Some of the job profiles offered were Senior Data Scientist, Analytics Manager, Statistical Analyst, Lead Business Analyst, Solution Analyst, Experienced Associate, Assistant Project Manager.

With continuously increasing demand of business analytics professionals, the future of PGDBA looks promising. The unique opportunity of learning from three top-notch institutes of the country is unparalleled by any other Data Analytics course. Recently, PGDBA was ranked 1 by the Analytics India Magazine among the data science courses offered in the country.

Stay tuned for more updates

Follow us on Facebook

 

Categories
Articles Chronicles Experiences

Foreign Internships: Ambassadors of PGDBA

As the founding batch prepares for their final placements, we bring to you a sneak peek into the persona of a unique group of people who are going to cross the borders for their 6-month long internships.

Madhur Modi 

It was June 2015 when a set of 51 students set foot on the doorsteps of IIM Calcutta, ready to embark upon a journey filled with chaos! They had no idea how their decision to join this new course – Postgraduate Diploma in Business Analytics (PGDBA), hosted by the three prestigious institutes in India and the oldest ones in this field would shape their future career. Even though they were all convinced of the strength of data and how it could be used to disrupt entire markets, they themselves weren’t sure what a roller coaster ride they were going to go through. Then, as the time to start the internship semester draws near, 6 students were offered foreign internship offers and today, four of them are traveling to various countries to start their foreign interns. In this founding batch, these six students also contain a blend of people with work experience and freshers and come from diverse backgrounds. Here’s a showcase of their profiles:

Alok Mani Singh The civil engineer from BHEL, and the “stud” with 4-year worth of experience from a Maharatna PSU and IIT Guwahati attached in his name was the first person who secured his foreign intern in the prestigious Dunia Finance in Dubai. The “Perfect Statistician” as well as the “Management expert” who can solve any problem that comes his way “within a night or two”. He also likes to play guitar and has entertained us for past one and a half years with his pleasant voice. Now he will be working closely with the head of strategic analytics function at Dunia during his 6-month internship in Dubai.

Siddhant Sanjeev “Coder” as everyone in the batch knows him to be, was the only other person to impress the company Dunia and accompany Alok to Dubai. Everyone in the batch knows him as a fresher with computer science background from NSIT and a national level coder. He also got himself an interview offer from Google owing to his coding skills as well as won case competitions from PwC and Deloitte. He also developed an e-commerce website using intricate concepts of natural language processing, recommendation systems and information retrieval.

Ankitkumar Sonthalia Better known as “Sonthu”, this guy has 3 years of work experience from Cognizant and now he is going to work for Rocket Internet in Myanmar. He has worked on marketing strategy, predictions and recommendation systems. He is a travel enthusiast and has organized many unforgettable trips for our batch. He has also conducted cricket fantasy league and freshers’ party.

Rachit Tripathi A mechanical engineer from IIT Kanpur by mind and a quant trader by heart, Rachit bagged an internship in France at QuantCube Technology, a niche fintech company. He’s also part of the team selected for data science game, an international inter-university competition held in Paris. In his pursuit of becoming a quant trader, he obtained CFA level 1 certification and solved various cases from those on Walmart to those on Yahoo data. He is also known as the “data scientist” of the batch and has recently been selected as one of the finalists in Goldman Sachs Quantify competition.

Avinash Kumar He is a mechanical engineer from NIT Jamshedpur and has worked in L&T for 3 years. He was offered an internship at Rocket Internet which he did not take as he aspired to work in India. He has written an international research paper and also has been part of the Data Science Game with Rachit during this course. He also won NASA systems engineering award while being a B.Tech. student. Besides, he is also a skilled TT player as well as a chess enthusiast.

Bharathi Ramaraj Bharathi got the internship offer from Rocket Internet but she decided to stay in India. The only girl to get a foreign internship offer, she is a fresher with B.Tech. in Electronics and Communications Engineering. She has worked on uplift modeling, high-frequency data and news analytics and has taken part in various Kaggle competitions. She is also an international chess player and has traveled to as many as 16 countries representing India. She envisions being a financial analyst and to use advanced analytics techniques in a financial credit company to build her career.

Despite such varying backgrounds and profiles, students of PGDBA were able to spread the word about the program and attained internships in various places like Dubai, France and Myanmar. A lot of us were not sure about what we will be able to achieve when we started this journey, but now seeing the accomplishments, as showcased by the examples above, we have grown a lot more confident in our abilities and can envisage a clearer picture of our future in the analytics world.

Best wishes to the entire batch for their internship semester! We are sure that all of you will imprint the brand PGDBA in the world of data analytics.

Categories
Articles Experiences ISI Chapter

Chances of Consequence

Grown-ups love figures. When you tell them that you’ve made a new friend, they never ask you any questions about essential matters. They never say to you, “What does his voice sound like? What games does he love best? Does he collect butterflies?” Instead, they demand: “How old is he? How many brothers has he? How much does he weigh? How much money does his father make?” Only from these figures do they think they have learned anything about him.

-The Little Prince

Our first exams in this first semester at ISI have finally arrived. One of the subjects of the course, and of the more interesting ones I might add, is ‘Probability and Stochastic Processes’. This was also the subject of our first exam. I have been pouring over notes and hundreds of pages of text in preparation of this exam. The intricacies in some of the ideas reminded me of these lines taken from ‘The Little Prince’ and quoted by David Freedman in his book on Statistics. Are such matters as figures and charts always dry and boring? In our first class, the professor of the same subject also remarked, “Maybe our insistence on numbers is the limitation of man”.

‘The Little Prince’ is a famous classic written by Saint Antoine du Exupery. Although it was first written as a children’s book, it has been enjoyed over the years by adults alike as a heart-warming story on friendship, growing-up and the curious fascinations of man. One of the oft used coinages in the book is the phrase ‘matters of consequence’. The Little Prince finds, at one point, a man aimlessly counting stars to the extent that he reaches ‘five hundred and one million’. This man then ironically refers to his counting as a matter of consequence.

In these days leading up to our exams, I began to wonder whether factuality could be made interesting in the context of some intricacies I came across in problems of Probability Theory. Are there questions that I can ask that are not as dry as the character’s occupation of counting in The Little Prince?

Consider the following questions. How many random people would I need to collect in a room, such that there would be half a chance that at least two of them share their birthdays? The answer to this is a surprisingly low number. Let’s look at a different question. How many people would I need to catch hold of and ask birthdays of to have half a chance of finding someone who shares my birthday? It is also interesting to note that this number is different from the first one. Louis CK, the famous stand-up comedian, made an interesting remark in one of his live shows. He said that there were enough people attending his show for there to be a fair chance that a few of them would die in the coming year. Ouch! Let’s pose that as a question on chance. Assume a population with a certain death rate. What is the number of people I would have to randomly collect in a room, such that there would be half a chance that someone would die tomorrow?

Enough with birthdays and death days. Consider coin tosses. Say, I keep tossing a coin and keep getting heads for a hundred tosses. I only know that it is equally likely that my coin is anything between a perfectly rigged coin to a perfectly fair coin. What is the chance that I win if I bet on the next toss to be heads? Let’s stretch this slightly further. Let us assume that when the universe was born, the probability that the sun would rise over planet earth each day was made to depend on a coin toss. We only know that this coin was likely to be anything between a perfectly fair to a perfectly unfair coin (God is playing a cruel game). What is the probability that the sun would rise tomorrow, given that it has been rising each day over all these years since the birth of the universe?

Let’s give some rest to our poor coin and move on to other questions. The Monty Hall Problem is a famous example on how sometimes the idea of chance can stump intuition. Here’s a different example in a similar vein. Suppose there are three suspects in jail who have been cleared of any wrongdoing and are all going to be released soon. It is announced that two of the three would be released the next day but these three don’t know exactly which two of them will be released. One of them decides to go ask the guard which one of the other two prisoners is going to be released. But then he thinks the following to himself, “After I have asked, I have only a one in two chance that I’m the second guy to be released. But before I ask, there is a two in three chance that I will be one of the guys to be released. So should I rather not ask?”

Like one of our professors quips so often, ‘Remember that there’s a chance model somewhere in the background.’ Somebody somewhere is flipping a coin. Maybe someday a story will be written on a travelling mathematician who visits magical planets to ask questions on chances of consequence. It’s a long shot that it would be anything as beautiful as the masterpiece that is the ‘The Little Prince’. But I wager it wouldn’t be as bad as dry pointless counting.

Categories
Articles Experiences Technical

Joka Library to Paris – A Data Science Crusade

A peek into Team Tabs, one of the three teams representing India at DSG.

It was a crisp Friday morning and I was seated comfortably in the plush IIMC library. The PGDBA semester was well underway, assignments were raining thick and fast…life was busy…life was good and I was brimming with excitement.

I had only just begun working on a competition, which had started 4 days ago on June 14th 2016. It was an inter-university data-science competition called the Data Science Game. With so many constraints such as limited number of submissions in a day and final selection of only one team from a college, it was, by all means the “big deal” and a glance at the list of competing universities showed us some tough nuts. There were the usual suspects i.e. Stanford, Cambridge, Oxford et al., supplemented by a host of premier universities from across the world.

Certainly our team of four from first-ever batch of PGDBA, though no novices, were far from being among the best in the world… or were they?

And so we – Team Tabs – prepared a starting output, clicked on the ‘Make Submission’ button and waited with a muted yet expectation-laced anxiety that any Kaggler worth his salt would be familiar with and then this image popped up on our screen:

tabs

Most authors describe moments like these with the cliche ‘There was a moment of silence followed by….’ I discovered that they were quite wrong…as my uncontrollable Hagrid-like laughter filled the breadth of the IIM Calcutta library, defiant of the several bemused yet stern glares that were pointed in my direction! Second in the whole world! Irrespective of its ephemerality, irrespective of the pains required to maintain it or the challenges that we were about to face in the coming three weeks – it was a moment of reckoning for us, a moment to cherish, a moment to savour. Yet, when I look back I can say with certainty that it was at this point I started believing that international glory wasn’t beyond our reach.

What followed were some of the most gruelling days of my life. Over the next three weeks, we went on to learn and implement Deep Learning (Convolutional Neural Networks) algorithms for image classification. We travelled to multiple universities in a quest for servers to run these algorithms. We learnt, we toiled, we toiled hard and we thrived. When the competition ended, we were the top team from India – Yes! Our hard work and perseverance led to us being the Rank 1 Indian team. We were among the 20 teams from around the world selected to travel to Paris for the final phase of the competition and folks, as you read this article, we’re on the flight journey towards the finals of the competition in Paris, to be held on 9th September.

What fills me with even more delight is that not just us, but three teams from India have made it to the final 20 – one each from IIM Calcutta-Team Tabs, IIT Kharagpur and ISI Kolkata- The Frequentists (in their ranking order). India has made its presence felt in this 2nd edition of Data Science Game and interestingly enough, all the three institutes are what together constitute PGDBA! It is encouraging to see that three Indian teams have proved themselves worthy of being the global top 20 when 146 teams from 28 countries participated and showed their mettle in this grueling competition.

The competition contained an image classification problem. A set of images were given, which had to be classified into four categories. The problem at hand could have been done in various ways. We decided to use deep learning as a lot of interesting work is being done around it and it’s one of the most advanced techniques currently available. We had a basic knowledge about it and developed more understanding as we moved along. The process of compiling and executing codes went on and we worked hard every single day. The machine learning algorithms take time to execute and with limited computing power at our disposal and time constraint of the competition, we ensured that every iota of it was used. As we were fighting neck to neck with all top notch universities across the world, the task was not at all easy and there were a lot of hurdles on the way. The limited computing power slowed us down. Every iteration of the code required a whole day and thus constrained our capacity to experiment with the algorithm. Soon other teams caught up with us on the leaderboard. To wrinkle out the problems we went to IIT-KGP and ISI to gain server access. However, the terminals at both places were preoccupied. As a last resort, we decided to use Amazon Web Services (AWS). AWS was difficult to set-up because of the complex technicalities and as none of us was acquainted with the process, it made our job all the more difficult. We quickly took charge and read about it from the scratch, spending a precious time of 3 days to figure everything out and get it running. In hindsight, it was worth the effort. Our first run in AWS increased the accuracy by 5 % and it all paid off with the jump on the leaderboard.

Now that we look at it, a lot of edge was given to us by our PGDBA curriculum. The basics of machine learning and computing were well laid out throughout the course. It enabled us to deep dive into deep learning and comprehend the technical aspects around it. We also consulted with professors for guidance. With Team Tabs standing at 12 in global rankings, we realize that we have learnt a lot on the way, when we were actually working on the problem statement.

We will now be competing with some of the top Kagglers in the finals. The finale would certainly provide us with global exposure as we will get a macroscopic view of what’s happening around the world in the field of data science by interacting with top-notch data analysts spread across the world. Since it’s a 2-day competition, the dynamics of the game is bound to change. We haven’t been able to put in continuous concentrated efforts towards the final round owing to the rigorous academic curriculum this semester and us coping up on classes. We do have a lot to cover but we will keep learning new stuff as we have been doing in the past year. Thus a great opportunity for knowledge transfer and networking lies ahead. With everyone’s hopes in us, we make our journey to Paris, where the final leg of the competition awaits… along with our fateful turnout in the 2nd Data Science Game competition.

About the team – “Team Tabs” from IIM Calcutta

Pranita Khandelwal – She completed her graduation (B.Tech.) in Electrical & Electronics Engineering and Masters in Economics from BITS Pilani. Initial interest in statistics and then further exploration of online courses made her pursue a career in the data science field.

Ritwik Moghe – He is a Mechanical Engineer from IIT Madras. With no coding background in the beginning, he learnt everything after joining the PGDBA program.

Avinash Kumar – He is a Mechanical Engineer from NIT Jamshedpur and has worked in manufacturing industry prior to joining the PGDBA program. While in college, he participated in some analytics competitions and enhanced his data science skills after studying in the three institutes of PGDBA.

Rachit Tripathi – He is a Mechanical Engineer from IIT Kanpur. He has worked on multiple projects in Robotics, programing and data handling areas while he was in college. His keen interest in mathematics and computing drove him to join PGDBA.

14249866_10205366489400045_2409767947772161122_o
Team Tabs

 

Do check out the team from ISI at the link: The Frequentists

screen-shot-2016-09-07-at-6-11-06-pm
The Frequentists
Categories
Articles ISI Chapter Technical

The Best Laid Schemes, of Spiders and Men!

What do entropy, linear programming and Riemann surfaces have in common? Puzzled? Now imagine this connection explained by an eccentric speaker in the attire of a French stage magician, with the charm and virtuosity of a storyteller. Cedric Villani, French mathematician, Fields Medal awardee in the year 2010 and famously called by the NewYorker magazine as the ‘Lady Gaga of Mathematics’ delivered a public lecture titled ‘Of Triangles, Gases, Prices and Men’ at ISI on 26th August 2016. The second PGDBA batch, currently in its first semester here at ISI, had the opportunity to be present at this intriguing and informative session.

Untitled
Cedric Villani

The abstraction and intrigue of Cedric could be assumed from the fact that an introduction to him included a reference to the number of his pets. This abstraction could also be inferred from the title of his talk, which was a play on the title of John Steinbeck’s famous classic ‘Of Mice and Men.’ The first slide of his presentation was taken from Tennyson’s Lady of Shalott. In Cedric’s own interpretation, the Lady of Shalott, accursed to see the world only through a mirror, was actually an allegory to the mathematician forever accursed to look at reality through his equations! Cedric then said that there are many more unsolved mysteries in Mathematics today than there were a hundred years ago. There are ever so many new problems that keep arising. Then there are those age-old ones that lie in famous mathematicians’ lists of unsolved problems. One such famous unproved hypothesis is the Riemann hypothesis. This led Cedric down the path to explaining Riemann’s works and then to the first part of the evening’s presentation – ‘triangles’.

He introduced to the audience Riemann surfaces and how Escher employed curved surfaces in his art. As examples of negative curvatures, he showed images of art installations in museums and models of coral reef. Einstein, with the help of his mathematician colleagues, used Riemann’s ideas to develop his General Theory of Relativity. Cedric went on to add that, the GPS technology so ubiquitous in the world today has its roots in Riemann’s works in topology. In a humorous turn of speech, Cedric noted that Riemann was as oblivious to his work being of practical use in 21st century devices, as modern day GPS users were to Riemann’s surfaces. An ironic symmetry indeed!

At this turn of his presentation, Cedric spoke about how it is equally important for scientists to pursue inspiration and not just utility. He marked out Riemann as someone who was particularly interested in approaching problems in his own unique way. Cedric quoted Poincare who had once said, “Mathematics is the art of giving the same name to different things.” The second part of his talk on ‘gases’ started with a description of his visit to Vienna and a search for Boltzmann’s grave. He said that he stopped to ask a family for a map not expecting them to know Boltzmann, let alone his grave. To his surprise, he was not only directed to the location of the grave but the person also exclaimed Boltzmann’s equation of entropy, “S=klogW”!

In connection with entropy, he then talked about the Gaussian curve, its ubiquitous nature and its uncanny appearance in many natural systems. He called the study of Probability and Statistics as ‘the extraordinary adventure of mastering of chance.’ As a matter of coincidence, he discussed a famous problem in his presentation called ‘Buffon’s needle’, which was also discussed in class earlier on the same day with the PGDBA students by their lecturer. Experiments such as coin tosses, Cedric said, are best done in the most careless ways! Then he explained how gases are modeled as billiard balls in collision and when there are many such sufficiently small billiard balls, their velocities are accurately modeled as the Gaussian distribution. As a note, Cedric remarked on the power of this distribution by quoting Sir Francis Galton who once called it the ‘supreme law of unreason.’

The next part of his presentation was ‘prices’. Cedric introduced Leonid Kantorovic, the father of linear programming. He explained how math is used to model the optimal allocation of resources. He then strung together ideas from the optimized distribution of resources to the distribution of gas molecules with the least loss of energy. The analogy of prices in linear programming is energy in distribution of gas molecules. This is where Cedric began piecing everything together with the last part of his talk called ‘men’. Cedric described how he had happened to meet his collaborators John Lott and Felix Otto. These men put together the ‘triangles’, ‘gases’ and ‘prices’ and helped Cedric complete his research on how fast gases reach the equilibrium stage described in Boltzmann’s equation. Cedric was awarded the Fields medal in connection with this research.

What would have been an intimidating subject matter coming from volumes over volumes of text, was aptly introduced in a two-hour lecture by Cedric Villani, a master at his trade, a storyteller par excellence, a dinosaur catcher in his childhood dreams and a true ambassador of modern mathematics. In a surprising irony of sorts, apart from the many hidden mysteries in the details of his works, the most apparent mystery is the brooches of spiders that he wears on the lapels of his coat!

Categories
Articles Experiences General

The Journey of Identities

To all those who just joined in. And to all those who couldn’t.

I’ve jumped ships. I’ve made the leap.
I am now a part of what they call Tata Hall- the luxurious guesthouse at IIM C, when I used to be a part of what they called Himadri (my IIT Delhi hostel). I am convinced that recording my first impressions of this maddening, surreal odyssey that I have set out on will not only prove to be fruitful in retrospect, but in fact help me in retaining my sanity in the present. For those of you who have been wondering what I’m going to blabber about, I would like to give you some context before you abandon me.

I am now a student at the PGDBA program. I am comfortably living in my posh suite of the guesthouse of IIM C, with my amazing roommate, situated in Joka, which is nearly thirty kilometers away from the city of Kolkata. There are one hundred and three other students in this program who live with me, so I am sometimes tempted to forget that we are literally in the middle of nowhere. We are being fed an illusion – but an irresistible one that I don’t have the heart to snap out of. In some ways, the fact that the city is beyond my reach is liberating. The five years I spent living in the heart of Delhi, and then another two years in the legendary city of Mumbai, I craved for an escape, for a way out. I think this program has finally answered my call for help.

Here, I am constantly told that I am special, that I am the chosen one. The twenty-four years of my life that have led to this moment have made me so cynical that I doubt their belief in me. But at the same time, I greatly value it. After a very long time, I am around people who are willing to invest in my future simply because the three accomplished panel members saw a spark in me, during the twenty-five minutes I spent interviewing for the program. I feel humbled and terrified. The two years ahead are going to demand every inch of me, it is going to consume me, and it is going to change me. But for so long I have felt nothing, that this rush, this constant buzz in my head, this restlessness feels good. I am sleep-deprived, surviving on n number of cups of terribly made tea and coffee and hopping from one assignment to the other with some extended breaks thankfully, and yet, I am alive. How often is it that you are so committed to the moment that every other structure inside and outside your mind breaks form, and this particular moment is all that you can see, all that you can register? This program is one such moment in my life.

The program, which aims at producing a team of unparalleled business analysts and data scientists, supposedly the sexiest job in the market these days, has been ideated by the three premiere institutes of India – Indian Institute of Management Calcutta, Indian Statistical Institute Kolkata and Indian Institute of Technology Kharagpur. This very credibility generates the “Pygmalion effect” and drives me to excel in the area of analytics.

The coming two years make me realize, that I’m far away from the comfort zone and stability that life had offered just before I decided to enroll into PGDBA. But I am also glad that it happened, for I can now appreciate the morning for what it truly represents – a fresh start. I like the concept of this new beginning, for it brings with it the exciting journey of exploring new dimensions and reconstruct your identity. I am tempted to fantasize about the future, but I am trying to contain my excitement to the moment. So many paths seem to call out to me that my brain will explode if I start to think too much. What do I want, you ask? Organized chaos. If you think that is romantically abstract, then you have understood me just a little more than you did before you began to read this note.

I will end with a few lines that so perfectly describe my state of mind right now, that I might just marry Robert Frost.

The woods are lovely, dark and deep,
But I have promises to keep,
And miles to go before I sleep,
And miles to go before I sleep.

Categories
Articles Technical

Analytics in Healthcare – Xerox Challenge

A brief overview on how I approached a real world healthcare problem via Analytics.

Robin Singh

Disclaimer : I have tried to restrict technicalities to a minimum in the blog, so as to cater to a wider segment of readers. However, a little awareness about machine learning will make the rest of the post even more comprehensible and hopefully exciting.

The healthcare industry is not only  huge but also has a tremendous potential for the use of technology and data science. In this post, I will share one of the numerous instances of the use of unconventional analytics to engineer solutions in response to challenges in the field of healthcare. This seemingly complex idea, can be structured and implemented by a combination of machine learning tools and data crunching techniques.

The solution proposed here is designed to work with critical patient data in hospitals and raise an alarm when the state of the patient degrades, eventually leading to potentially fatal outcomes. Now the obvious questions, How will such a system help? Can the doctors not monitor patient state physically? Well, it is only possible for a doctor to physically monitor a small number of patients. What if the number of patients is large?. Also, the decision to provide intensive care to patient after an alarm has been raised has monetary and human life impacts. If the alarm can be raised in time, intensive care although expensive, can be provided to the patient.

At the backend, the model can be seen as a typical classification machine learning problem . Classification, as the name suggests is a method to categorize data points into predetermined target groups. Numerous algorithms can do classification like Bayesian model, decision tree, random forest, regression etc. We used Random Forest model for the current data set, due to simplicity and ease of implementation.

However, the classification method happened to be only the tip of the iceberg. There were many unforeseen challenges –  primarily due to data coming from the healthcare domain and also computational resource constraints. First, healthcare data is highly erratic and the severity of a measurement varies from person to person. For example,  a certain value of a respiratory measurement can be dangerous and life risking for a normal person but normal for a smoker. This poses a fundamental challenge to the accuracy of models built on the healthcare data. Second, the  state to be predicted is different from the state whose training data is available. This is slightly difficult to grasp, but lets try. We want to raise an alarm when the patient’s situation is worsening from normal and approaching mortality but still the patient has time. However, the training dataset has information on the actual mortality/no-mortality. Using the training data to learn will imply making an approximation. The third challenge comes from implementation aspects. The prediction of no-mortality should be highly reliable as compared to prediction of mortality. The system should be able to predict the no-mortality situations with an accuracy of 99% or above. Accuracy in no-mortality and mortality have a trade-off and hence if we tune the model for high accuracy in no-mortality then the accuracy on mortality is low.

Let’s take a moment to think about the methodology again. What can we observe? The predictive model seems to be replicating logic similar to a real doctor. In fact, the very idea of machine learning is to train the machines to apply logic like human beings do. For example, using the past data to learn and take decisions in the future cases, considering trade-offs originating from the decision making process and using the concept of information value to take  decision.

Discussed above is one example on the use of analytics and artificial intelligence in the healthcare scenario. There are many unexplored applications in the domain, a huge scope for improvement in the existing models and unquantifiable amount of data to process. In the coming years devices based on such models will be a reality and the industry requires many more analyst to cater to the demand.

Categories
Articles Technical

A Record, a Code and Twitter

How my first machine learning model was validated by a goal scored a continent away.

   Ritwik Moghe

It was the twenty-eighth day of November 2015. As the strangely balmy day yawned, stretched and gratefully gave way to dusk, several eyes were glued to the actions of one man. The man was slight, had strange spiky hair and a face that might remind many of all those ‘dawgs’ or ‘dealers’ from Breaking Bad. Only a year back, hardly anyone knew of his existence. And today, he was about to etch his name in the annals of footballing history.

As he latched on to a pitch perfect through-ball that split the Manchester United defense in half and slotted it in past the oncoming goalkeeper, several things exploded. One of those was the voice of the legendary Martin Tyler as he shouted “Vardy! Its Eleven, it’s Heaven for Jamie Vardy” (The goal as it unfolded). Jamie Vardy, a name most of you might still be unfamiliar with, had broken the English Premier League Record of scoring goals in most number of consecutive matches. He had scored in each of the past eleven games. In the grand scheme of things, the record, in itself might not be much significance. What mattered more was Vardy’s story. From an amateur player with no ‘proper’ training or facilities and very humble beginnings, he had risen to be the most prolific striker in one of the most competitive leagues around the world. It was a classic fairy-tale. For several amateurs trudging every evening into those muddy football fields and trying to curl it like Carlos, Vardy was hope.

So as he was being engulfed by his team-mates after he had scored that crucial record-breaking goal, Vardy was causing another explosion around the world. It was an explosion of hope, of greetings and of admiration. And sitting in our dorm rooms overlooking the ponderous Barrackpore Trunk Road in the quaint campus of ISI Calcutta, a bunch of us fledgling data-scientists of PGDBA captured this joyous explosion. We captured it using Twitter.

Messi_Vardy A graph encapsulating the positive Twitter sentiment about Vardy right after THE GOAL!

The problem that we were working on was Opinion Mining through tweets. Billions of tweets are posted every day. These tweets reflect the opinions or sentiments of the users about various topics. For instance, a tweet like “I love Apple #Iphone6” might reflect the user’s positive sentiment about the company Apple. A study of several such tweets about a particular subject or company can provide valuable insights to the company about the general public opinion about themselves.

We were analyzing the Twitter sentiment about various current and upcoming football stars. Our aim was to identify the next big star, the one who would eclipse Messi and attain the ultimate pinnacle of fame by someday being the brand ambassador of Tata Motors! Our observation about Vardy and his big day was a mere microcosm of a bigger project where we analyzed over one million tweets about 10 players obtained over a period of one month.

The analysis began with mining tweets about the particular players. The tweets were obtained from an API using Python. Relevant meta-data like the location of the user and the time-stamp of the tweet was extracted along with the text of the tweet.

Data_Extraction

The text of the tweet was then converted into a Term-Document Frequency Matrix (TDF). Now only a year ago, all that I could I have thought of on hearing ‘Term-Document Frequency Matrix’ would have been Neo in his slick glasses staring into some green numbers floating Chinese-style on an antique nineties monitor! But TDF is way simpler than that. All it does is that it creates a table. Each row is a tweet. All the words observed in all the tweets that we are studying make up the columns.

Consider this example for clarity-

TDF

Thus each word is now a feature and each tweet a data-point. We then used a Machine Learning technique called Maximum Entropy Classifier in R to classify each tweet or data-point into one of the three categories: Positive, Neutral or Negative. (I could get into the details of the work, about why we went for supervised classification approach, about why MaxEnt works best for Text Classification etc. But since I’m trying to make this article tractable for someone with no prior analytics experience I stop by providing a link to another blog about our detailed work. (A detailed report of our project)

Now this process was carried out for all the tweets about all the different players. The prevalent sentiment about a particular player was given by the difference between the number of positive and negative tweets (which was also normalized). Doing this helped us observe several interesting trends in the data. Consider the comparative study of sentiments about Neymar, Ronaldo and Harry Kane over November 2015. Also, have a look at how the sentiment about Harry Kane varied across countries.

Ronaldo-Kane

Kane-Sentiment

Such analysis has huge potential applications. Imagine how Tottenham Hotspurs, the club which Harry Kane plays for could maximize their profits by opening more ‘Spurs Stores’ in South Africa where Kane is way more popular (green) as compared to say Australia where he is clearly notorious (red). Are you an executive at EA Sports and want to decide whom to have on the cover of FIFA 16? Just mine sentiment on twitter and viola, you’ll see that Neymar would be a way better choice than Kane.

So this was all about our project on Twitter sentiment about football superstars. This project was a part of our course called Computing for Data Sciences at ISI Kolkata. All of our fellow mates from PGDBA have also worked on several such (hopefully: P) interesting projects. Some of them will share their stories with you on this blog as well.

Vardy and his record holds a special place in our hearts. He was the perfect muse for demonstrating the effectiveness of our model. When you’ve come up with your first ‘Real’ model, the true test of the model happens when you see it work in real life on a completely unexpected scale. That meteoric rise in Vardy’s sentiment at 5.55 pm BST, right after he had scored the crucial goal proved to us beyond doubt that our model worked! So, I sign off with a link to that moment when Vardy smashed a record, the moment when people around the world celebrated the dawn of a new star, and the moment when our model was validated! Cheers!