Dominic's Homepage!

Profile Pic

View My GitHub Profile

LinkedIn

My name is Dominic Teo and I'm currently an analytics manager with the Ministry of National Development (MND). I'm interested in the intersection of public policy and technology as well as the application of Big Data and Civic Tech in the public sphere.

I graduated Cum Laude from Sciences Po Paris with a BA in Social Sciences & Economics followed by a MS in Computational Analysis and Public Policy (MSCAPP) at the University of Chicago. The MSCAPP degree is a 2 year dual degree offered by the Schools of Computer Science and Public Policy.

Take a look at my resumé and various projects! (click on titles of projects for more details)

Hosted on GitHub Pages — Theme by orderedlist

Predicting the success of Mobile Applications in the Google Play Store

This project was developed by Dominic Teo, Grace Prakaisriroj and Alessandro Luciano as our capstone group project for the undergraduate final year Statistics course ST309.

View the Project on GitHub

The more detailed report on the entire project can be found in the pdf document. Unfortunately, the Rmarkdown file has been lost.

Introduction

The global app economy is estimated to be worth 6.3 trillion USD by 2021, up from 1.3 trillion USD in 2018 (AppAnnie,2019). Over this period the user base will almost double from 3.4 billion people using apps to around 6.3 billion (AppAnnie, 2019). However, the majority of developers are still struggling to break even (AppSurvey, 2013). For those unsuccessful app developers, a clear analysis of the characteristics of existing successful apps would provide a useful insight into creating apps that users want (Tian,2015).

Therefore, the main goal of this project is to identify the characteristics that successful apps share and investigate which of these factors is important for success.

Dataset Variables

Our final cleaned-dataset used in modelling has 13 predictor-variables (Android.Ver has been removed through EDA) in 8 dimensions and 6738 obervations in total.

The 8 dimensions we used for our analysis are:

success

categories

rating

paid

Methodology

Our project has two research questions:

  1. What characteristics do successful apps share?
  2. What factors are important for app success on the Google Play Store?

We applied 3 different classification modeling approaches to the data in order to find significant variables in determining the success of mobile apps.

  1. Logistic Regression

  2. Decision Trees and Pruned Decision Trees

  3. Random Forest

Results and Conclusion

From our study and use of the 4 different models, we can conclude that the random forest model is most effective in predicting the success of mobile applications. Our random forest model demonstrates that Price, Rating, DaysSinceLastUpdated and Size are the most important variables used in the construction of the model.

This is important for developers who want to develop successful mobile applications. They should hence develop free applications with the aim of receiving high ratings in the Google Play Store. This could mean that developers trying to make money from their apps could stand to make their apps free but concentrate on in-app monetization opportunities instead. Making consumers pay for the app itself seems to be a major deterrent for users to download the app.

More interestingly, developers should also continuously and consistently update the app as we found that apps that were more recently updated tend to be more successful. We also found that overall, apps that were larger in size were more successful (although less significant than the other three variables). We think that size could be a good proxy for complexity and sophistication, hence apps that are more complex and developed tend to be more popular.

result