{"id":7713,"date":"2018-10-14T08:39:38","date_gmt":"2018-10-14T13:39:38","guid":{"rendered":"https:\/\/www.carnaghan.com\/?p=7713"},"modified":"2019-02-04T21:52:29","modified_gmt":"2019-02-05T02:52:29","slug":"exploratory-data-analysis-with-watson-analytics","status":"publish","type":"post","link":"https:\/\/www.carnaghan.com\/exploratory-data-analysis-with-watson-analytics\/","title":{"rendered":"Exploratory Data Analysis with Watson Analytics"},"content":{"rendered":"

Watson Analytics provides a suite of analytics tools that are easy to use for non-technical people. The software opens the door for data preparation and exploration for managers and other personnel who would benefit from analysis but don’t necessarily have the advanced analytical background. This article details the analysis of a dataset called Laptop Prices, sourced from Kaggle<\/a>. The dataset comprises 1300 records of various laptop models, which was last updated six months ago from writing, adding additional laptop characteristics and prices. This analysis will begin with Watson Analytics starting points, which is helpful for exploration purposes. Watson provides pre-defined suggestions that help the user determine the best way to proceed with their own analysis. These starting points provide insight on the different facets of data available within the dataset. The Laptop Prices data consists of the following data:<\/p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Column Name<\/strong><\/td>\nData Type<\/strong><\/td>\nDescription<\/strong><\/td>\n<\/tr>\n
Company<\/td>\nString<\/td>\nProducer of Laptop<\/td>\n<\/tr>\n
Product<\/td>\nString<\/td>\nMake and Model<\/td>\n<\/tr>\n
TypeName<\/td>\nString<\/td>\nType (Notebook, Ultrabook, Gaming, etc.)<\/td>\n<\/tr>\n
Inches<\/td>\nNumeric<\/td>\nScreen Size<\/td>\n<\/tr>\n
ScreenResolution<\/td>\nString<\/td>\nScreen Resolution<\/td>\n<\/tr>\n
Cpu<\/td>\nString<\/td>\nLaptop CPU<\/td>\n<\/tr>\n
Ram<\/td>\nString<\/td>\nLaptop RAM<\/td>\n<\/tr>\n
Memory<\/td>\nString<\/td>\nHard Disk \/ SSD Memory<\/td>\n<\/tr>\n
GPU<\/td>\nString<\/td>\nGraphics Processing Unit<\/td>\n<\/tr>\n
OpSys<\/td>\nString<\/td>\nOperating System<\/td>\n<\/tr>\n
Weight<\/td>\nString<\/td>\nLaptop Weight<\/td>\n<\/tr>\n
Price_euros<\/td>\nNumeric<\/td>\nPrice (In Euros)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

 <\/p>\n

This article focuses on data exploration within this dataset using the various Watson Analytics features including its natural language capability. Natural language allows users to ask questions and focus on the problems they are trying to solve, instead of spending a large amount of time learning a new language or set of sophisticated tools. Following exploration, we will look at data refinement and how to improve the analysis using grouping, filtering and other features.<\/p>\n

Data Cleansing<\/h2>\n

The laptop dataset contains the following columns with their respective quality scores reported by Watson Analytics:<\/p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Column Name<\/strong><\/td>\nQuality Score<\/strong><\/td>\n<\/tr>\n
Company<\/td>\nMedium Quality (67)<\/td>\n<\/tr>\n
Product<\/td>\nUnique values<\/td>\n<\/tr>\n
TypeName<\/td>\nMedium Quality (60)<\/td>\n<\/tr>\n
Inches<\/td>\nHigh Quality (74)<\/td>\n<\/tr>\n
ScreenResolution<\/td>\nMedium Quality (61)<\/td>\n<\/tr>\n
Cpu<\/td>\nUnique Values<\/td>\n<\/tr>\n
Ram<\/td>\nMedium Quality (60)<\/td>\n<\/tr>\n
Memory<\/td>\nMedium Quality (63)<\/td>\n<\/tr>\n
GPU<\/td>\nUnique Values<\/td>\n<\/tr>\n
OpSys<\/td>\nMedium Quality (54)<\/td>\n<\/tr>\n
Weight<\/td>\nUnique Values<\/td>\n<\/tr>\n
Price_euros<\/td>\nMedium Quality<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

 <\/p>\n

The medium quality columns all had quite a diverse range of data. To perform an analysis using the raw data alone would prove challenging without further filtering or grouping (covered later in the paper). The highest quality score was assigned to screen size (inches) at 74. Some of the inherent quality issues in this dataset are systemic with the scope of the data type. For example, the screen resolution column should only provide resolution data. Instead this column also provides the type of screen, IPS panel, Full HD, etc.<\/p>\n

These attributes could be separated into another variable providing a better means of analysis. The same issue is present with the \u2018Memory\u2019 variable, which references hard disk size. It also includes the type of hard disk (SSD, Flash, etc.). These attributes could also be separated into their own columns. The combined nature of the data yields many different values that can be challenging to discern meaning without further separation, grouping, filtering, or other cleansing techniques.<\/p>\n

Data Exploration<\/h2>\n

Half the battle in problem solving and decision making is framing the problem or decision in a creative way so that it can be addressed effectively. Davenport (2013)<\/a>. Thankfully Watson Analytics provides much of the power at our fingertips in identifying starting points for problem solving. Upon loading the Laptop Prices dataset into Watson Analytics, a series of these starting points were presented. These included sample questions to begin the journey of data exploration. The initial questions presented included the following:<\/p>\n