{ "cells": [ { "cell_type": "markdown", "id": "6055ab2e-fd92-4ea3-a700-e58bdbd05405", "metadata": {}, "source": [ "# Data Science Process using CRISP-DM\n", "\n", "CRISP-DM merupakan salah satu proses yang sangat umum digunakan dalam data science. CRISP-DM, atau kepanjangan dari *Cross Industry Standard Process for Data Mining*, dalam data science, digunakan sebagai framework untuk memulai sebuah proyek data science sampai menemukan solusi yang dikehendaki. Ujung dari sebuah proyek data science terdiri dari 2 tujuan:\n", "* untuk manusia → berupa laporan, presentasi, *insights*, dan sejenisnya\n", "* untuk komputer → *deployment*, perangkat lunak, dan sejenisnya.\n", "\n", "Secara umum, terdapat 8 tahapan dalam CRISP-DM:\n", "1. *Business understanding*\n", "2. *Data understanding*\n", "3. *Data preparation*\n", "4. *Modeling*\n", "5. *Deployment*\n", "\n", "Kita akan menggunakan data *[Customer Personality Analysis](https://www.kaggle.com/imakash3011/customer-personality-analysis)* untuk mempraktikkan CRISP-DM. Silakan unduh data melalui link tersebut dan simpan di dalam folder `data/marketing_campaign.csv`.\n", "\n", "## A Look At The Data\n", "\n", "Terkadang kita punya data yang akan kita pakai untuk mendefinisikan masalah yang akan kita selesaikan, itulah kenapa banyak perusahaan yang ingin menyimpan \"terlebih dahulu\" semua data yang berkaitan dengan perusahaan untuk kemudian dianalisis di lain waktu. Itu yang akan kita lakukan sekarang dan untuk memulai, kita impor beberapa library yang akan kita butuhkan." ] }, { "cell_type": "code", "execution_count": 1, "id": "564b6d6b-aac3-41cd-8a20-cc0b1ef861cd", "metadata": { "tags": [ "remove-output" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/bitlabsinstructor/.pyenv/versions/3.8.11/envs/bitlabs-webinar/lib/python3.8/site-packages/pandas/compat/__init__.py:124: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.\n", " warnings.warn(msg)\n" ] } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "plt.style.use(\"fivethirtyeight\")" ] }, { "cell_type": "markdown", "id": "428e683d-c43f-4d66-91f6-44a859682731", "metadata": {}, "source": [ "Kemudian, kita muat data `data/marketing_campaign.csv` menggunakan pemisah \"tab\" (`\\t`)." ] }, { "cell_type": "code", "execution_count": 2, "id": "b79201a5-415f-4696-a26c-0f587f79c2dc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "Year_Birth | \n", "Education | \n", "Marital_Status | \n", "Income | \n", "Kidhome | \n", "Teenhome | \n", "Dt_Customer | \n", "Recency | \n", "MntWines | \n", "... | \n", "NumWebVisitsMonth | \n", "AcceptedCmp3 | \n", "AcceptedCmp4 | \n", "AcceptedCmp5 | \n", "AcceptedCmp1 | \n", "AcceptedCmp2 | \n", "Complain | \n", "Z_CostContact | \n", "Z_Revenue | \n", "Response | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "5524 | \n", "1957 | \n", "Graduation | \n", "Single | \n", "58138.0 | \n", "0 | \n", "0 | \n", "04-09-2012 | \n", "58 | \n", "635 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "1 | \n", "
1 | \n", "2174 | \n", "1954 | \n", "Graduation | \n", "Single | \n", "46344.0 | \n", "1 | \n", "1 | \n", "08-03-2014 | \n", "38 | \n", "11 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2 | \n", "4141 | \n", "1965 | \n", "Graduation | \n", "Together | \n", "71613.0 | \n", "0 | \n", "0 | \n", "21-08-2013 | \n", "26 | \n", "426 | \n", "... | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
3 | \n", "6182 | \n", "1984 | \n", "Graduation | \n", "Together | \n", "26646.0 | \n", "1 | \n", "0 | \n", "10-02-2014 | \n", "26 | \n", "11 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
4 | \n", "5324 | \n", "1981 | \n", "PhD | \n", "Married | \n", "58293.0 | \n", "1 | \n", "0 | \n", "19-01-2014 | \n", "94 | \n", "173 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
5 rows × 29 columns
\n", "\n", " | ID | \n", "Year_Birth | \n", "Education | \n", "Marital_Status | \n", "Income | \n", "Kidhome | \n", "Teenhome | \n", "Dt_Customer | \n", "Recency | \n", "MntWines | \n", "... | \n", "NumWebVisitsMonth | \n", "AcceptedCmp3 | \n", "AcceptedCmp4 | \n", "AcceptedCmp5 | \n", "AcceptedCmp1 | \n", "AcceptedCmp2 | \n", "Complain | \n", "Z_CostContact | \n", "Z_Revenue | \n", "Response | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | \n", "1994 | \n", "1983 | \n", "Graduation | \n", "Married | \n", "NaN | \n", "1 | \n", "0 | \n", "15-11-2013 | \n", "11 | \n", "5 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
27 | \n", "5255 | \n", "1986 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "1 | \n", "0 | \n", "20-02-2013 | \n", "19 | \n", "5 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
43 | \n", "7281 | \n", "1959 | \n", "PhD | \n", "Single | \n", "NaN | \n", "0 | \n", "0 | \n", "05-11-2013 | \n", "80 | \n", "81 | \n", "... | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
48 | \n", "7244 | \n", "1951 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "2 | \n", "1 | \n", "01-01-2014 | \n", "96 | \n", "48 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
58 | \n", "8557 | \n", "1982 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "1 | \n", "0 | \n", "17-06-2013 | \n", "57 | \n", "11 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
71 | \n", "10629 | \n", "1973 | \n", "2n Cycle | \n", "Married | \n", "NaN | \n", "1 | \n", "0 | \n", "14-09-2012 | \n", "25 | \n", "25 | \n", "... | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
90 | \n", "8996 | \n", "1957 | \n", "PhD | \n", "Married | \n", "NaN | \n", "2 | \n", "1 | \n", "19-11-2012 | \n", "4 | \n", "230 | \n", "... | \n", "9 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
91 | \n", "9235 | \n", "1957 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "1 | \n", "1 | \n", "27-05-2014 | \n", "45 | \n", "7 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
92 | \n", "5798 | \n", "1973 | \n", "Master | \n", "Together | \n", "NaN | \n", "0 | \n", "0 | \n", "23-11-2013 | \n", "87 | \n", "445 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
128 | \n", "8268 | \n", "1961 | \n", "PhD | \n", "Married | \n", "NaN | \n", "0 | \n", "1 | \n", "11-07-2013 | \n", "23 | \n", "352 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
133 | \n", "1295 | \n", "1963 | \n", "Graduation | \n", "Married | \n", "NaN | \n", "0 | \n", "1 | \n", "11-08-2013 | \n", "96 | \n", "231 | \n", "... | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
312 | \n", "2437 | \n", "1989 | \n", "Graduation | \n", "Married | \n", "NaN | \n", "0 | \n", "0 | \n", "03-06-2013 | \n", "69 | \n", "861 | \n", "... | \n", "3 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
319 | \n", "2863 | \n", "1970 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "1 | \n", "2 | \n", "23-08-2013 | \n", "67 | \n", "738 | \n", "... | \n", "7 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
1379 | \n", "10475 | \n", "1970 | \n", "Master | \n", "Together | \n", "NaN | \n", "0 | \n", "1 | \n", "01-04-2013 | \n", "39 | \n", "187 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
1382 | \n", "2902 | \n", "1958 | \n", "Graduation | \n", "Together | \n", "NaN | \n", "1 | \n", "1 | \n", "03-09-2012 | \n", "87 | \n", "19 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
1383 | \n", "4345 | \n", "1964 | \n", "2n Cycle | \n", "Single | \n", "NaN | \n", "1 | \n", "1 | \n", "12-01-2014 | \n", "49 | \n", "5 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
1386 | \n", "3769 | \n", "1972 | \n", "PhD | \n", "Together | \n", "NaN | \n", "1 | \n", "0 | \n", "02-03-2014 | \n", "17 | \n", "25 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2059 | \n", "7187 | \n", "1969 | \n", "Master | \n", "Together | \n", "NaN | \n", "1 | \n", "1 | \n", "18-05-2013 | \n", "52 | \n", "375 | \n", "... | \n", "3 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2061 | \n", "1612 | \n", "1981 | \n", "PhD | \n", "Single | \n", "NaN | \n", "1 | \n", "0 | \n", "31-05-2013 | \n", "82 | \n", "23 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2078 | \n", "5079 | \n", "1971 | \n", "Graduation | \n", "Married | \n", "NaN | \n", "1 | \n", "1 | \n", "03-03-2013 | \n", "82 | \n", "71 | \n", "... | \n", "8 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2079 | \n", "10339 | \n", "1954 | \n", "Master | \n", "Together | \n", "NaN | \n", "0 | \n", "1 | \n", "23-06-2013 | \n", "83 | \n", "161 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2081 | \n", "3117 | \n", "1955 | \n", "Graduation | \n", "Single | \n", "NaN | \n", "0 | \n", "1 | \n", "18-10-2013 | \n", "95 | \n", "264 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
2084 | \n", "5250 | \n", "1943 | \n", "Master | \n", "Widow | \n", "NaN | \n", "0 | \n", "0 | \n", "30-10-2013 | \n", "75 | \n", "532 | \n", "... | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "1 | \n", "
2228 | \n", "8720 | \n", "1978 | \n", "2n Cycle | \n", "Together | \n", "NaN | \n", "0 | \n", "0 | \n", "12-08-2012 | \n", "53 | \n", "32 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
24 rows × 29 columns
\n", "\n", " | NumWebPurchases | \n", "NumStorePurchases | \n", "NumCatalogPurchases | \n", "
---|---|---|---|
count | \n", "2233.000000 | \n", "2233.000000 | \n", "2233.000000 | \n", "
mean | \n", "4.097627 | \n", "5.806986 | \n", "2.670399 | \n", "
std | \n", "2.773621 | \n", "3.242016 | \n", "2.923871 | \n", "
min | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "
25% | \n", "2.000000 | \n", "3.000000 | \n", "0.000000 | \n", "
50% | \n", "4.000000 | \n", "5.000000 | \n", "2.000000 | \n", "
75% | \n", "6.000000 | \n", "8.000000 | \n", "4.000000 | \n", "
max | \n", "27.000000 | \n", "13.000000 | \n", "28.000000 | \n", "