Simple Predictive Modeling with Watson Analytics

S
  1. Poor Decisions Impacted by Bias in Software Development
  2. Organizational Culture Changes and Impacts on the Individual & Decision Making
  3. Using Data Analytics to Improve Heath Care
  4. Is Your Organization Using Evidence-Based Decision Making?
  5. Exploratory Data Analysis with Watson Analytics
  6. Simple Predictive Modeling with Watson Analytics
  7. Rapid Reporting and Visualization Development with Watson Analytics

This article focuses on two different predictive models built from two different data sources from Kaggle, Laptop Price and Mobile Price Classification. Laptop Price consists of records of various laptop models, which was last updated six months ago from writing, adding additional laptop characteristics and prices. Mobile Price Classification provides data on mobile phones including price range classifications. The Laptop Prices data consists of the following variables:

Column NameData TypeDescription
CompanyStringProducer of Laptop
ProductStringMake and Model
TypeNameStringType (Notebook, Ultrabook, Gaming, etc.)
InchesNumericScreen Size
ScreenResolutionStringScreen Resolution
CpuStringLaptop CPU
RamStringLaptop RAM
MemoryStringHard Disk / SSD Memory
GPUStringGraphics Processing Unit
OpSysStringOperating System
WeightStringLaptop Weight
Price_eurosNumericPrice (In Euros)

The Mobile Price Classification data consists of the following variables:

Column NameData TypeDescription
blueBoolean (0 or 1)Bluetooth feature
dual_simBoolean (0 or 1)Dual SIM feature
four_gBoolean (0 or 1)4G Connectivity
three_gBoolean (0 or 1)3G Connectivity
price_rangeText (Classification)Low, medium, high, very high
touch_screenBoolean (0 or 1)Touch Screen feature
wifiBoolean (0 or 1)WiFi
battery_powerNumberBattery Power
front_cameraNumberFront Camera
internal_memoryNumberInternal Memory
mobile_depthNumberDepth
n_coresNumberProcessor
pixel_resulotionNumberResolution
primary_cameraNumberPrimary Camera
ramNumberRAM
screen_heightNumberScreen Height
screen_widthNumberScreen Width
talk_timeNumberTalk Time
clock_speedNumberClock Speed
mobile_weightNumberMobile Weight

Data Quality

The columns for Laptop Prices are displayed with their respective quality scores:

Column NameQuality Score
CompanyMedium Quality (67)
ProductUnique values
TypeNameMedium Quality (60)
InchesHigh Quality (74)
ScreenResolutionMedium Quality (61)
CpuUnique Values
RamMedium Quality (60)
MemoryMedium Quality (63)
GPUUnique Values
OpSysMedium Quality (54)
WeightUnique Values
Price_eurosMedium Quality (63)

The columns for Mobile Price Classification are displayed with their respective quality scores:

Column NameQuality Score
blueHigh Quality (100)
Dual_simHigh Quality (100)
Four_gHigh Quality (100)
Price_rangeHigh Quality (100)
Three_gHigh Quality (70)
Touch_screenHigh Quality (100)
WifiHigh Quality (100)
Battery_powerHigh Quality (98)
Front_cameraHigh Quality (73)
Internal_memoryHigh Quality (96)
Mobile_depthHigh Quality (93)
N_coresHigh Quality (100)
Pixel_resolutionHigh Quality (75)
Primary_cameraHigh Quality (99)
RamHigh Quality (100)
Screen_heightHigh Quality (93)
Screen_widthHigh Quality (76)
Talk_timeHigh Quality (99)
Clock_speedHigh Quality (86)
Mobile_weightHigh Quality (100)

Predictive Model Development

The classification predictive model was built from the mobile price classifications dataset. The target variable was ‘price_range’, which was reported in Watson Analytics to have a quality score of 100%. The price range variable classifications included low cost, medium cost, high cost, and very high cost. It served as the categorical target variable. Using Watson Analytics, a predictive model was built to determine what drives price range. A spiral model showing what drives price can be seen below.

Spiral Diagram

From the spiral model, it is apparent that RAM is the biggest driver of the price range. To dig into this further an analysis of how price range is impacted solely by RAM, showing a strong relationship between higher amounts of RAM resulting in price range of very high cost.

Price range comparison

Next, a decision tree was generated, which yielded a classification table shown below. The decision tree had a predictor importance of RAM at 0.97 and battery power at 0.02. All records were included in the model.

Classification table
PredictedPercent correct
high_costlow_costmedium_costvery_high_cost
Actualhigh_cost3140959163%
low_cost045050090%
medium_cost4848404081%
very_high_cost500144990%

From the table, the percent of records correctly predicted were highest for very high cost and low cost, both at 90% correct. Medium cost was predicted at 81% correct, and high cost was the lowest at 63%. The decision rules for very high cost, high cost, medium and low cost were also interesting as shown in the following diagrams.

Decision Rules – Very High Cost

Decision tree

Decision Rules – High Cost

Decision tree

Decision Rules – Medium Cost

Decision tree

Decision Rules – Low Cost

Decision tree

For very high cost, 100% of records with RAM higher than 3,255.25 fell into this category. It is clear that the other variables of battery power, front camera, and resolution contributed to very high cost in a much smaller way. RAM was also the predominant force in for high, medium and low price ranges. The decision tree illustrates this below.

Decision Tree

The continuous predictive model was built from the laptop price dataset. The target variable was price_euros, which was reported within Watson Analytics of having a medium quality score of 63. This variable was a continuous target variable with different prices assigned to each record. The spiral diagram, shown below demonstrates that RAM is the top single driver of price.

Spiral Diagram

The diagram below shows this relationship singled out into a bar chart. The predictive strength is however overall lower than the earlier classification model above at only 59%.

Bar chart

The predictor importance table below illustrates the significant strength of RAM as a predictor compared to the other variables.

Predictor importance
InputValue
Ram0.66
Memory0.16
TypeName0.10
Resolution Groups0.03
ScreenResolution0.03
OpSys0.02

The analysis drew on all records in the record set with none excluded from the model. The decision rules were also interesting as demonstrated below.

Decision tree

Ram was at the forefront of predicting price along with less significant variables across the various price points illustrated in the table. The decision tree for the price euros predictive model can be seen in below.

Decision tree

Overall this model did not give as strong an indication of drivers for price as the classification model, however it was clear that RAM was again the top driving variable.

About the author

Ian Carnaghan

I am a software developer and online educator who likes to keep up with all the latest in technology. I also manage cloud infrastructure, continuous monitoring, DevOps processes, security, and continuous integration and deployment.

About Author

Ian Carnaghan

I am a software developer and online educator who likes to keep up with all the latest in technology. I also manage cloud infrastructure, continuous monitoring, DevOps processes, security, and continuous integration and deployment.

Follow Me