My Journey Through Machine Learning With Python
I got interested in Machine Learning in late 2018 through a blog I accidentally stumbled upon. It was an opener to the world of Artificial Intelligence & Machine Learning. As a Computer Science graduate and Full Stack Programmer of over 20 years, It was a no brainer to look the way of Machine Learning. I was really fascinated and excited by the capability and process of Machine Learning.
2019
In 2019 I decided to do some online hands-on Machine Learning courses after reading so numerous tutorials, posts and blogs on Machine Learning with Python.The courses included beginners to intermediate levels. Its a pity I did not document all my activities during this time. But I did document the Pre-Processing Data section of the Prima indians onset of Diabetes dataset
Same year I was lucky to be invited to attend a one week Data Science, AI and Machine Leaarning with Python event, tagged AI+ Invasion Port Harcourt, organised by the Data Science of Nigeria. It was an insightful and fun event. I got to meet other people that were interested in Machine Learning.
15th February, 2019:
Prepare For Modeling by Pre-Processing Data
Today, I continued working on the Prima indians onset of Diabetes dataset and read more about "Prepare For Modeling by Pre-Processing Data". Data cleaning is a critically important step in any machine learning project. Data scientists claim that 80% of their time is consumed by the hectic process of data cleaning.
Sometimes you need to prepare your data in order to best present the inherent structure of the problem in your data to the modeling algorithms. There are many techniques that one can use to prepare data for modeling. I am using the pre-processing capabilities provided by the scikit-learn.
- For example, I tried the fiollowing Techniques :
using the scale and center options.
Lets see some codes:
- Standardization:
Output:
[[ 0.64 0.848 0.15 0.907 -0.693 0.204 0.468 1.426]
[-0.845 -1.123 -0.161 0.531 -0.693 -0.684 -0.365 -0.191]
[ 1.234 1.944 -0.264 -1.288 -0.693 -1.103 0.604 -0.106]
[-0.845 -0.998 -0.161 0.155 0.123 -0.494 -0.921 -1.042]
[-1.142 0.504 -1.505 0.907 0.766 1.41 5.485 -0.02 ]] ,
- Normalization:
Output:
[[0.034 0.828 0.403 ... 0.188 0.004 0.28 ] [0.008 0.716 0.556 ... 0.224 0.003 0.261] [0.04 0.924 0.323 ... 0.118 0.003 0.162] ... [0.027 0.651 0.388 ... 0.141 0.001 0.161] [0.007 0.838 0.399 ... 0.2 0.002 0.313] [0.008 0.736 0.554 ... 0.241 0.002 0.182]]
- Binarization:
Output:
[[1. 1. 1. 1. 0. 1. 1. 1.]
[1. 1. 1. 1. 0. 1. 1. 1.]
[1. 1. 1. 0. 0. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1.]
[0. 1. 1. 1. 1. 1. 1. 1.]]
18 May, 2021
On this day I became officially a Kaggler. :)
My first competition was to build a predictive model that answers the question: “what sorts of people were more likely to survive?” from the Titanic mishap. I guess all first timers go through the same process. The project was given to us in order to help us get familiar with the Kaggle environment and competition processes. But we are still required to do the job though.
My first task was to download the Titanic datasets and have a peek at them. Then I set out examine the dataset properly using Pandas Python Library in order to a get better sense of the structure, count, missing records, nature of the content of the features and the overall dataset.
...
19 May, 2021
— Fredrick Ughimi (Twitter: @fredrickughimi)
Follow @meganetsoftware
Send WhatsApp Message
Comments
Post a Comment