Bird’s Eye View:
The Blueprint:
1 - Exploratory Analysis: First, “get to know” the data. This step should be quick, efficient, and decisive. 2 - Data Cleaning: Then, clean your data to avoid many common pitfalls. Better data beats fancier 3 - Feature Engineering: Next, help your algorithms “focus” on what’s important by creating new features. 4 - Algorithm Selection: Choose the best, most appropriate algorithms without wasting your time. 5 - Model Training: Finally, train your models. This step is pretty formulaic once you’ve done the first 4.
there are other situational steps as well:
1 - Project Scoping: Sometimes, you’ll need to roadmap the project and anticipate data needs. 2 - Data Wrangling: You may also need to restructure your dataset into a format that algorithms can handle. 3 - Preprocessing: Often, transforming your features first can further improve performance. 4 - Ensembling: You can squeeze out even more performance by combining multiple models.
Exploratory Analysis:
Plot Categorical Distributions:
- Categorical features cannot be visualized through histograms. Instead, you can use bar plots.
- In particular, you’ll want to look out for sparse classes, which are classes that have a very small number of observations.
- By the way, a “class” is simply a unique value for a categorical feature.
Plot Segmentations:
- The median transaction price (middle vertical bar in the box) for Single-Family homes was much higher than that for Apartments / Condos / Townhomes.
- The min and max transaction prices are comparable between the two classes.
- In fact, the round-number min ($200k) and max ($800k) suggest possible data truncation… …which is very important to remember when assessing the generalizability of your models later!
Feature Engineering:
- You can isolate and highlight key information, which helps your - algorithms “focus” on what’s important.
- You can bring in your own domain expertise.
- Most importantly, once you understand the “vocabulary” of feature engineering, you can bring in other people’s domain expertise!
Data Wrangling:
Data Wrangling steps:
- Set up your environment.
- Import libraries and dataset.
- Understand the data.
- Filter unwanted observations.
- Pivot the dataset.
- Shift the pivoted dataset.
- Melt the shifted dataset.
- Reduce-merge the melted data.
Resources:
Done by Omar-zoubi