Select Page

Simple Predictive Modeling with Watson Analytics

Simple Predictive Modeling with Watson Analytics

This article focuses on two different predictive models built from two different data sources from Kaggle, Laptop Price and Mobile Price Classification. Laptop Price consists of records of various laptop models, which was last updated six months ago from writing, adding additional laptop characteristics and prices. Mobile Price Classification provides data on mobile phones including price range classifications. The Laptop Prices data consists of the following variables:

Column Name Data Type Description
Company String Producer of Laptop
Product String Make and Model
TypeName String Type (Notebook, Ultrabook, Gaming, etc.)
Inches Numeric Screen Size
ScreenResolution String Screen Resolution
Cpu String Laptop CPU
Ram String Laptop RAM
Memory String Hard Disk / SSD Memory
GPU String Graphics Processing Unit
OpSys String Operating System
Weight String Laptop Weight
Price_euros Numeric Price (In Euros)

The Mobile Price Classification data consists of the following variables:

Column Name Data Type Description
blue Boolean (0 or 1) Bluetooth feature
dual_sim Boolean (0 or 1) Dual SIM feature
four_g Boolean (0 or 1) 4G Connectivity
three_g Boolean (0 or 1) 3G Connectivity
price_range Text (Classification) Low, medium, high, very high
touch_screen Boolean (0 or 1) Touch Screen feature
wifi Boolean (0 or 1) WiFi
battery_power Number Battery Power
front_camera Number Front Camera
internal_memory Number Internal Memory
mobile_depth Number Depth
n_cores Number Processor
pixel_resulotion Number Resolution
primary_camera Number Primary Camera
ram Number RAM
screen_height Number Screen Height
screen_width Number Screen Width
talk_time Number Talk Time
clock_speed Number Clock Speed
mobile_weight Number Mobile Weight

Data Quality

The columns for Laptop Prices are displayed with their respective quality scores:

Column Name Quality Score
Company Medium Quality (67)
Product Unique values
TypeName Medium Quality (60)
Inches High Quality (74)
ScreenResolution Medium Quality (61)
Cpu Unique Values
Ram Medium Quality (60)
Memory Medium Quality (63)
GPU Unique Values
OpSys Medium Quality (54)
Weight Unique Values
Price_euros Medium Quality (63)

The columns for Mobile Price Classification are displayed with their respective quality scores:

Column Name Quality Score
blue High Quality (100)
Dual_sim High Quality (100)
Four_g High Quality (100)
Price_range High Quality (100)
Three_g High Quality (70)
Touch_screen High Quality (100)
Wifi High Quality (100)
Battery_power High Quality (98)
Front_camera High Quality (73)
Internal_memory High Quality (96)
Mobile_depth High Quality (93)
N_cores High Quality (100)
Pixel_resolution High Quality (75)
Primary_camera High Quality (99)
Ram High Quality (100)
Screen_height High Quality (93)
Screen_width High Quality (76)
Talk_time High Quality (99)
Clock_speed High Quality (86)
Mobile_weight High Quality (100)

Predictive Model Development

The classification predictive model was built from the mobile price classifications dataset. The target variable was ‘price_range’, which was reported in Watson Analytics to have a quality score of 100%. The price range variable classifications included low cost, medium cost, high cost, and very high cost. It served as the categorical target variable. Using Watson Analytics, a predictive model was built to determine what drives price range. A spiral model showing what drives price can be seen below.

Spiral Diagram

From the spiral model, it is apparent that RAM is the biggest driver of the price range. To dig into this further an analysis of how price range is impacted solely by RAM, showing a strong relationship between higher amounts of RAM resulting in price range of very high cost.

Price range comparison

Next, a decision tree was generated, which yielded a classification table shown below. The decision tree had a predictor importance of RAM at 0.97 and battery power at 0.02. All records were included in the model.

Classification table
Predicted Percent correct
high_cost low_cost medium_cost very_high_cost
Actual high_cost 314 0 95 91 63%
low_cost 0 450 50 0 90%
medium_cost 48 48 404 0 81%
very_high_cost 50 0 1 449 90%

From the table, the percent of records correctly predicted were highest for very high cost and low cost, both at 90% correct. Medium cost was predicted at 81% correct, and high cost was the lowest at 63%. The decision rules for very high cost, high cost, medium and low cost were also interesting as shown in the following diagrams.

Decision Rules – Very High Cost

Decision tree

Decision Rules – High Cost

Decision tree

Decision Rules – Medium Cost

Decision tree

Decision Rules – Low Cost

Decision tree

For very high cost, 100% of records with RAM higher than 3,255.25 fell into this category. It is clear that the other variables of battery power, front camera, and resolution contributed to very high cost in a much smaller way. RAM was also the predominant force in for high, medium and low price ranges. The decision tree illustrates this below.

Decision Tree

The continuous predictive model was built from the laptop price dataset. The target variable was price_euros, which was reported within Watson Analytics of having a medium quality score of 63. This variable was a continuous target variable with different prices assigned to each record. The spiral diagram, shown below demonstrates that RAM is the top single driver of price.

Spiral Diagram

The diagram below shows this relationship singled out into a bar chart. The predictive strength is however overall lower than the earlier classification model above at only 59%.

Bar chart

The predictor importance table below illustrates the significant strength of RAM as a predictor compared to the other variables.

Predictor importance
Input Value
Ram 0.66
Memory 0.16
TypeName 0.10
Resolution Groups 0.03
ScreenResolution 0.03
OpSys 0.02

The analysis drew on all records in the record set with none excluded from the model. The decision rules were also interesting as demonstrated below.

Decision tree

Ram was at the forefront of predicting price along with less significant variables across the various price points illustrated in the table. The decision tree for the price euros predictive model can be seen in below.

Decision tree

Overall this model did not give as strong an indication of drivers for price as the classification model, however it was clear that RAM was again the top driving variable.

Image Credits: Photo by rawpixel on Unsplash.

Related Articles

Series Navigation<< Exploratory Data Analysis with Watson AnalyticsRapid Reporting and Visualization Development with Watson Analytics >>

About The Author

Ian Carnaghan

I am a software developer and online educator who likes to keep up with all the latest in technology. I also manage cloud infrastructure, continuous monitoring, DevOps processes, security, and continuous integration and deployment. In my spare time I teach undergraduate classes in web development.

Leave a reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blogging