K for Kaggle
What is Machine Learning?
Machine learning is a method that is focused on the development of computer programs that can access data and learn from it automatically, without human assistance or intervention. The entire machine learning concept is based on the assumption that we should give machines access to information and let them learn from it.
Intro to Kaggle:
Kaggle is a crowd-sourced platform to attract, nurture, train, and challenge data scientists from all around the world to solve data science, machine learning, and predictive analytics problems. Kaggle is like an Airbnb for Data Scientists – this is where they spend their nights and weekends. It enables data scientists and even students to engage in running machine learning contests, write and share code, and to host datasets.
The types of data science problems posted on Kaggle can be anything from attempting to predict cancer occurrence by examining patient records to analyzing sentiment to evoke by movie reviews and how this affects audience reaction.
While some competitions on Kaggle are just for educational purposes and fun brain exercises, others are genuine issues that companies are trying to solve. Kaggle maintains a leaderboard for every competition it holds and at the end of the competition, the winner is announced.
Prerequisites:
Well. Enthu is not sufficient here.
Python:
A good resource to start from scratch would be Python 3 programming by the University of Michigan. Although, You can skip the 5th Course in this specialization. This course does not dive into Machine Learning implementations, so it will be useful in general.
Note: Python has a lot of packages and unlike basic level C, it is difficult to remember everything. So, don’t try to remember the syntax completely. With time, you will develop the memory for the important ones. So, don’t forget Google!
Machine Learning:
Majorly you need to have a basic idea of the various algorithms used for different Machine Learning problems. Although you would just need a line to implement the algorithm by calling it from a package, there would be no value to your implementation if you don’t know what’s going on. So, to satisfy these needs Machine Learning by Stanford University on Coursera shall be sufficient.
People generally complain about the above course using Matlab for Programming Assignments while Python has become handier for Machine Learning. So, click here to have a look at a GitHub repository for the assignments in Python.
Do note: This course does not have explanations for all the algorithms you will need for Machine Learning. Although it will make you mature enough to understand new algorithms. So, whenever you come across a new algorithm, remember there is something called Google.
Micro-Courses:
So, Kaggle has this set of micro-courses which help you get started faster. Note: it’s kind of a crash course so that you would start implementing as too many courses will make this journey boring. These micro-courses shall give you a feeling of satisfaction after achieving the skill to implement in competitions. Also, they give you certificates [for people who think they hold value]. So, I have created a list of all the necessary and sufficient micro-courses in order:
Python [If you skipped the Prerequisites section :)]
Pandas
Data Visualization
Intro to Machine Learning
Intermediate Machine Learning
Feature Engineering
Machine Learning Explainability [You can skip this]
If you want to go through a complete course, you can go through the first 3 courses of this Applied Data Science with Python specialization.
Competitions:
The ‘getting started’ category of Kaggle Competitions are really helpful for beginners and a great way to start the journey of the competitions on Kaggle and here are the two most important ones:
Titanic: Machine Learning from Disaster
The competition is simple: use machine learning to create a classification model that predicts which passengers survived the Titanic shipwreck.
This being one of your first Competitions, this Kaggle kernel which is one of the top solutions for this competition shall surely help you in creating a plan and developing the ability to think.
Housing Prices Competition
This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set.
Practice Skills
Creative feature engineering
Advanced regression techniques like random forest and gradient boosting
Helpful Tips:
Looking at a top-rated notebook for Kaggle competitions is a great way for beginners to get started in the data science field.
Blog Posts and Articles are really a great source for developing and improving your skills.
There is a good chance that you can find inspiration here: http://ndres.me/kaggle-past-solutions/
It is a great compilation of Kaggle competition winners’ solutions.
What Next:
Explore more advanced competitions on Kaggle. Also check out famous data science blogs like Analytics Vidhya, Towards Data Science.
Do check this Google Tech Dev Guide, it is a great compilation of necessary articles and a step by step guide for your journey.
Once you are confident in dealing with Basic Regression and Classification type problems, you can explore different avenues such as Time Series, Natural Language Processing, Computer Vision, and Reinforcement Learning.
DISCLAIMER
Shaastra TechShots’ publications contain information, opinions and data that Shaastra TechShots considers to be accurate based on the date of their creation and verified sources available at that time. It does not constitute either a personalized opinion or a general opinion of Shaastra or IIT Madras. The information provided comes from the best sources, however, Shaastra TechShots cannot be held responsible for any errors or omissions that may emerge.