title: Data Science
subtitle: DLMBDSA01
authors: Prof. Dr. Claudia Heß
publisher: IU International University of Applied Sciences
date: 2022

Unit 2: Use Cases and Performance Evaluation

p. 33 - 50

Learning goals for this section:

I just want to note somewhere that I had a quick search for ML datasets and this website, The 60 Best Free Datasets for Machine Learning | iMerit, lists Sooo many website, it’s unreal.

Introduction

The book discusses an example of studying TV show viewer habits and preferences to understand what variables affect a show’s popularity and viewer preferences. “Use cases and scenarios show business managers the benefits data science tools can offer for enhancing decision-making within their organizations.”

There’s a lot to think about. Once a model is created the performance must be evaluated. But users must also be aware of “cognitive and motivational biases that might impact model inputs.” That can skew results and make a model less reliable.

2.1 - Data Science Use Cases (DSUCs)

Data Science Use Case = DSUC

Identifying DSUCs and Their Value Propositions

Businesses unlock value by in-depth investigation of its own collected data. Obvious challenges then become collecting and managing this data. Additionally, finding the right DSUC for their business objectives. The right application of data can provide valuable insights to limit risks and improve future gains.

A DSUC can be identified through 3 main (general) aspects:

Focus should be on increasing value, and reducing effort and risk.

Business should ask themselves:

Text shows 3 images pp. 36 - 37. There are 3 types of DSUCs viewed with relevant questions. I like typing so I’ll summarize:

It is interesting to note that the value propositions mimic the aspects to identify a DSUC.

Learning the Dataset and building the Prediction Model

Suppose we have identified a Data Science Use Case. Now the data scientist must collect the relevant dataset from available data sources, build a new data source. Types of data sources include:

Collection can be expensive if human intervention is required to manually insert tags and such. Once data is collected it is then sent for preprocessing where it is scrubbed for any noise and scanned for redundant records and missing values.

“Cleaning data typically requires significant domain knowledge to make decisions about the best method to deal with errors in the data.”

Once the data is cleaned, the dataset will contain variables (i.e. features) with numerical, categorical, and/or textual values. Features should be carefully selected to be used to determine output value propositions. Some features may not be relevant for our DSUC.

After data is cleaned, the data scientist can build the prediction model. Its purpose to define relationships between inputs and outputs. Typically datasets are divided into training, test, and validation sets, or just training and test. These are covered in my previous section.

Many approaches have been developed, machine learning in particular.

Making Predictions and Decisions

When a prediction model is built, it may require many iterations of the model to produce reasonably high level of accuracy with respect to the testing set… Not sure I like how that is worded.

The model and its predictions are given to the user to be used for decision making, or whatever they do with it. Sometimes its just the results, other times the entire model. The book suggests supplying the model to the user with a friendly front-end interface.

Sometimes the end user makes a decision that impacts the data records. Example being given some predictions about optimal selling price, a product owner may change the sale price of their product. The model should be developed to include a feedback loop that accommodates these changes and can be retrained accordingly.

The goal is complete automation of end users corresponding decisions.

Machine Learning Canvas

OwnML.co has a Machine Learning Canvas. You can search for it on the internet. It’s meant to help the developer identify a use case and achieve its value proposition.

2.2 - Performance Evaluation

We evaluate the model at two times really: during development of prediction model, usually to improve power, and after development with the test set to gather information on model accuracy and such.

Evaluation of how well the DSUC has been modelled and its predicitve values applied successfully within a business can also be divided into 2 parts:

Per the very official seeming website KPI.org, KPIs are “critical (key) quantifiable indicators of progress toward an intended result. KPIs provide a focus for strategic and operational improvement, create an analytical basis for decision making and help focus attention on what matters most.” That’s really all you probably need to know for now because KPIs are more of a business management topic.

Model-Centric Evaluation: Performance Metrics

Just going to cover several metrics to measure performance of prediction model, both classification and regression

Classification Model Evaluation Metrics

The output of these prediction models is a probability that determines which class the output is assigned to. There are really four possible results of classification:

ClassificationGuessActualError Type
True Positive (TP)YesYesNone
False Positive (FP)YesNoType I
True Negative (TN)NoNoNone
False Negative (FN)NoYesType II

So we have different types of right and wrong. In a statistical sense:

Talking about the null hypothesis makes is a little confusing and backwards sounding. But for the type II, you would accept the null hypothesis, that a thing does not belong to a class, when in fact it does.

The book does a confusion matrix to list counts as well. Metrics for these models are Accuracy, Precision, Recall.

Accuracy is the ratio of the number of correct predictions to total predictions:

Accuracy=count(TP)+count(TN)count(TP)+count(TN)+count(FP)+count(FN)\text{Accuracy} = \frac{\text{count}(TP)+\text{count}(TN)} {\text{count}(TP)+\text{count}(TN)+\text{count}(FP)+\text{count}(FN)}

Precision measures how correct model is when returning positive results:

Precision=count(TP)count(TP)+count(FP)\text{Precision} = \frac{\text{count}(TP)} {\text{count}(TP)+\text{count}(FP)}

Recall measures how often the model produces true positives. This metric is used if we can tolerate false positives more than false negatives.

Recall=count(TP)count(TP)+count(FN)\text{Recall} = \frac{\text{count}(TP)} {\text{count}(TP)+\text{count}(FN)}

It’s a bit of a weird measure. You wouldn’t want you data to be unbalanced. What do we mean by that tolerance bit? If a false positive is not an issue, like for a Covid case. Better safe than sorry, you stay in doors and take another test. However, a false negative is detrimental, where the person could unknowingly spread the disease and cause many people to become sick.

So, if there are many false negatives, you’ll see recall drop significantly. Between precision and recall, you can tell what the model is doing.

A classification model can apply different thresholds to distinguish between classes which can alter the results from the model.

A Receiver Operator Characteristic (ROC) curve displays the trade-off between the true positive rate and the false positive rate at every possible threshold. The best model can classify with 100% TP and 0% FP. The ROC curve helps find the best threasholds to result in the highest scores.

To generate your own ROC curve:

  1. Choose cutoff value between [0:100][0 : 100] percent of maximum value of model output
  2. Assign the test set according to their classes and count TP, TN, FP, FN values
  3. Calculate FP and TP rates (below)
  4. Form a single point on ROC curve with coordinate {x: false positive rate, y: true positive rate}
  5. Choose another threshold and repeat from step 2.

Formulas:

FP rate=count(FP)count(FP)+count(TN)TP rate=count(TP)count(TP)+count(FN)\begin{gather*} \text{FP rate} = \frac{\text{count}(FP)}{\text{count}(FP)+\text{count}(TN)}\\ \text{TP rate} = \frac{\text{count}(TP)}{\text{count}(TP)+\text{count}(FN)}\\ \end{gather*}

Why choosing the TN and FN counts in the denominators is not well explained. Looking it up, apparently it is because, for example N=FP+TNN = FP + TN, where NN is the number of total negatives. That makes sense because a false positive is a negative case.

I guess if this were calculus, you’d solve when the slope is one. You want to minimize the FPs, get the numerator close to zero, and maximize the TPs.

Regression Model Evaluation Metrics

The output of a regression prediction model is a probability density distribution, which is translated into a number (optimal point estimator) to be useful. Evaluation of said model is a comparison of predicted values to actual. We usually use:

Absolute error

ϵ=dy\epsilon = |d-y|

So, the book uses dd as the desired output. But statistics uses y^\hat{y} as the predicted, so the equation will look like

ϵ=yy^\epsilon = |y-\hat{y}|

Relative error is a normalization of absolute error

ϵ=yy^y100%\epsilon^* = |\frac{y-\hat{y}}{y}|\cdot100\%

Mean absolute percentage error is an average of relative error. We are building up our formulas

MAPE=1ni=1nyiy^iyi100%\text{MAPE} = \frac{1}{n} \sum_{i=1}^n \left| \frac{y_i-\hat{y}_i}{y_i} \right| \cdot 100\%

This is helpful if the underlying probability density distribution of the values is sufficiently far from zero, such that zero does not have a significant impact. As you can see, parts of the summation become undefined with yi=0y_i=0.

Square error ensures a positive is obtained in a different way, but adds significant weight to larger errors.

ϵ2=(yy^)2\epsilon^2=(y-\hat{y})^2

Mean square error (MSE) is the average of square errors

MSE=1ni=1n(yy^)2\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y-\hat{y})^2

Mean absolute error is more robust than the mean squared error with respect to datasets that contain outliers

MAE=1ni=1nyiy^i\text{MAE} = \frac{1}{n} \sum_{i=1}^n \left| y_i-\hat{y}_i \right|

Root mean square error is the squared root of the MSE, making the magnitude easier to interpret.

RMSE=MSE=1ni=1n(yy^)2\text{RMSE} = \sqrt{\text{MSE}} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y-\hat{y})^2}

Business-Centric Evaluation: The Role of KPIs

At this point, you have evaluated your model with the previously described metrics, or more. It is ready to be implemented to produce the DSUC value for the associated business problem. Back to KPIs as the model is helping the business achieve these. They are often related to improving revenue, reducing costs, increasing efficiency, and/or enhancing customer satisfaction.

Characteristics of Effective KPIs

Most helpful KPIs:

Examples of KPIs

Some KPIs routinely implemented in organizations to measure DSUC performance include but are not limited to:

When KPI is defined, we determine the best method to a assess performance against it.

Cognitive Biases and Decision-Making Fallacies

Cognitive and motivational biases may be disruptive to the collected dataset and basically all other stages of the data process. They influence the quality of the model and can be the cause of significant inaccuracies. A data scientist must be aware of biases

The course book has an image of Cognitive Bias Codex.

Relative Cognitive biases

Common cognitive and motivational biases

De-biasing Techniques

These are meant to reduce or eliminate bias. Most common alternative is to consult other experts. Consider that an answer to each bias.

Common cognitive and motivational biases and debasing techniques


Video Lectures

There are endless use cases for Data Science, but what is important is:

The exam will usually ask for… “Value Propositions”, which are examples!

Performance evaluation in data science. Key Performance Indicators (KPI) are indicators that have a direct tie to the performance of the business. We also need model evaluation techniques like cross-validation, confusion matrix, and root mean square error, to determine how good our model is at predictions.

Then, we must determine the impact on the business. Business impact metrics include return on investment, customer churn rate, customer lifetime value, and revenue and sale increases.

We discuss a confusion matrix, showing positive and negative results, True and False.

Accuracy assesses the overall correctness of the model’s predictions. It is the sum of the true positives and true negatives over all instances. However, if the dataset is imbalanced, it can appear to be deceptively correct. Accuracy means how close we are to the true state.

Precision assesses the accuracy of positive predictions made by the model. It is the number of true positives over the sum of true and false positives. Precision is more about the spread of predictions. Numerically, you can be precisely wrong, if the answer is meant to be around ten and we keep predicting eight, it’s precise, but wrong. See, precision does not consider the negative classifications.

There’s also True Positive Rate (TPR) (AKA recall or sensitivity) measures proportion of actual positive instances that are correctly predicted as positive. It is true positives over the sum of true positives and false negatives -> which is all positives.

Then, there’s False Positive Rate (FPR) is proportion of actual negative instances that are incorrectly predicted as positive. It is false positives over the sum or false positives and true negatives -> all negatives.

There are reasons you might need to have higher rates in one side than the other.

The Receiver Operator Characteristic (ROC) curve is True Positive Rate (y-axis) against False Positive Rate (x-axis).


title: Key Performance Indicators
subtitle: Developing, Implementing, and Using Winning KPIs
edition: 4
authors: David Parmenter
publisher: John Wiley & Sons, Inc.
date: 2019-10-29
publication date: 2020
accession number: ihb.49461
ISBN:
	- 9781119620822
	- 9781119620792

Key Performance Indicators: Developing, Implementing, and Using Winning KPIs

Key Performance Indicators: Developing, Implementing, and Using Winning KPIs, 4th Edition | Wiley Currently the e-book is $31.

Ch. 1 - The Great KPI Misunderstanding

pp. 3 - 24

The Four Types of Performance Measures

2 groups of measures:

2 measures for each group

  1. Key Result Indicators (KRI) - give board an overall summary of how organization is performing.
  2. Result Indicators (RI) - tell management how teams are combining to produce results.
  3. Performance Indicators (PI) - tell management what teams are delivering
  4. Key Performance Indicators (KPI) - tell management how the organization is performing 24/7, daily, or weekly in their critical success factors, and by taking action management is able to increase performance dramatically.

Management taking action? 😂

KRIs are often mistaken for KPIs. However, they measure the results of many. Good to review quarterly with the board to understand progression with strategy. KRIs are always a past measure. As such, management cannot use to change direction. They also do not tell you how to improve results.

Result indicators are good but less important summary of activities. They look at activity over wider time horizon, not just quarterly, but could be weekly or daily. So, yesterday’s sales would be an RI.

KPIs are indicators that focus on aspects of organizational performance. There are also Critical Success Factors, like the timely arrival of aeroplanes. Ascertaining the 5-8 CSFs is a vital step in any KPI exercise.

The author gives example with airport and Ryaniar, with a focus on timeliness of planes. They aren’t luxury, so they make money on timely flights. Basically, late arrivals cause late departures. But it was viewed as an issue “not invented by us.” However, negative effects carried over.

Taking ownership of the KPI and CSF, regardless of the cause is the real issue. And so other success factors like cleaning, refueling, and catering had to be reprioritized.

The airport identified many costs associated with late departures and tried to handle as best they could with what they could do.

The 7 Characteristics of KPIs

The author continues with many examples.

Performance Indicators

Performance indicators (PIs) are non-financial indicators that can be traced back to a team that are not fundamental to the organization’s well-being.

Number of Measures Required: The 10/80/10 Rule

We want to answer questions like:

For a medium sized company, maybe 500 FTEs:

Key Result Indicators(board KPIs)Result IndicatorsPerformance IndicatorsKey Performance Indicators(operational KPIs)}  10 reported everyboard meeting80 reported daily, weeklyor monthly10 reported 24/7,daily, or weekly\left. \begin{array}{l} \text{Key Result Indicators}\\ \text{(board KPIs)}\\ \\ \text{Result Indicators}\\ \text{Performance Indicators}\\ \\ \text{Key Performance Indicators}\\ \text{(operational KPIs)}\\ \end{array} \right\} \ \ \begin{array}{l} \text{10 reported every}\\ \text{board meeting}\\ \\ \text{80 reported daily, weekly}\\ \text{or monthly}\\ \\ \text{10 reported 24/7,}\\ \text{daily, or weekly} \end{array}

Author suggests an organization create a Board Dashboard to show KRIs along with a summary financials all on one fan-fold page. For smaller organizations, they may reduce number of RIs and PIs.

Depending on the text, it’s recommended 10 KPIs, and seldom 20 at most. More KPIs might be applicable for a business of many businesses in many different sectors.

Difference between KRIs and KPIs and RIs and PIs

Might just list differences, or characteristics here:

The Lead and Lag Confusion

Lead indicators are performance driven and Lag indicators are outcomes. However, they’re exact determination can be fuzzy. “Lead-and-lag labeling of measures is misleading.”

Mix 60% Past, 20% Present, 20% Future-Oriented Measures

KPIs are characterized as past, current, or future-focused measures. But mostly current measures because they are measured so frequently, right? We let current measure reflect those monitored 24/7 or daily. Future measures are the record of agreed future commitments, when an action is meant to take place.

It is the author’s recommendation to strive for a mix of 60% past, 20% present, and 20% future-oriented measures.

The author provides a good list of examples of future measures:

Again, the author iterates that a KPI provided to management that is more than a few days old is useless. So these other measures can also fall into the realm of KRIs, PIs and RIs.

Importance of Timely Measurement

The sooner management has their information, the sooner they can delegate changes. There is a recommended reporting framework for KPIs.

Where are You in Your Journey with Performance Measures?

The author provides a checklist to assess your progress with performance measures.

title: Top Five High-Impact Use Cases for Big Data Analytics
publisher: Datameer, Inc
date: 2014
url: http://orcp.hustoj.com/wp-content/uploads/2016/01/eBook-Top-Five-High-Impact-UseCases-for-Big-Data-Analytics.pdf

Top Five High-Impact Use Cases for Big Data Analytics

An internet search lead me to this link of a 16 page book that promotes DataMeer. These are just some notes for personal reference.

Companies able to gain insights from their data gain a competitive edge over their peers. However, getting insights quickly may be beyond the limitations of traditional enterprise data warehouses and business intelligence software.

  1. It can take time to collect, prepare, and analyze all fragmented and (often) unstructured data.
  2. Most business professionals rely on IT to gather and organize data into a data warehouse. This can render data obsolete by the time it is ready. Business needs could change or something.
  3. The amount of data, especially unstructured data, can be hard to analyze.

You know it is big data when it is so large and complex that it becomes difficult to manage with existing data management tools.

Big Data Analytics

Big data analytics is meant to enable users to analyze all structured, semi-structured, and unstructured customer data together, all at once. Datameer’s services do what a data scientist does:

… your data.

Datameer puts it all into Hadoop and provides the analyzing and visualizing tools you need.

Customer Analytics

You need insights into customer buying journey because that is how you improve customer conversion rates, personalize campaigns to increase revenue, predict and avoid customer churn, an lower customer acquisition costs.

The issue is customers interact with companies through many points of contact, social media, stores, websites, etc… Making it more complex and creating many types of data. Analyzing all this data can give insights into:

Big Data Analytics Work

Big data analytics is the key to unlocking insights from customer behaviour data. It combines, integrates, and analyses all data, structured and unstructured, all at once.

Again, Datameer can get all data from web, social media, transaction, etc… into Hadoop. Then, data is enriched with 3rd party data, analysed and visualized.

You can use Datameer to correlate customer purchase histories and information on social media to target new customers.

You can predict which customers are going to move funds from your investment firm so you can launch retention campaigns… instead of continuously providing a great service to retain them the whole time…

Operational Analytics

Understanding Machines, Devices and Human Interactions

Executives of manufacturing, operations, service or product are pressured to optimize asset utilization, budgets, performance, and service quality. So, how can IT executives help? IT can provide high-impact data projects, in a timely manner, to predict product failures, optimize existing infrastructure, and reduce operational and capital expenditures. The secrets are buried in log, sensor and machine data. If only there was a platform that could ingest and combine data from multiple sources…

Big Data Analytics at Work

You can combine your structured and unstructured data and use the results to detect outliers, perform run time series and root cause analyses, and parse, transform and visulize data.

You can use historic machine data and failure patterns to predict and improve mean time-to-failure. User Datameer to:

Datameer offers many out-of-the-box connectors and a file parser to integrate any data and join data with enrichment functions. They also have many visual widgets and free-form infographics for stunning visualizations.

Then the short book provides several customer case studies.

Fraud and Compliance

Data driven insights can help uncover hidden and suspicious actions in a timely enough manner to mitigate risks.

Big Data Analytics at Work

You can combine you data, from financial transactions, geo-location data, to authorization and submission data, plus social media channels. Then, use the data to address fraud and compliance-related challenges. Perform time series analysis, data profiling and accuracy calculations, root cause analysis, breach detection, etc…

Can help with:

Data-Driven Products and Services

Some companies use big data to create new products and service offerings, based on said data. You can harness your Customer Relationship Management (CRM) data, social media, transaction, geo-location, device, sensor, and product data. Turn data into insights for more impactful ad campaigns and such.

Again, combine all your data in one place to:

Also good to know how fast you can get a prototype of a data product, scale users’ data and more.

Case studies in the online book.

EDW Optimization

Enterprise Data Warehouses (EDWs) are critical business and IT resources. However, too much data can turn timely data into outdated as it takes to long to process to meet business needs. It also becomes very expensive for some services to store all of your data.

But Datameer to the rescue. You can:


Check your Knowledge

Which of the following results in a dataset that is representative and reduced in size?

The Pre-processing step does this.

The objective of a prediction model is to produce reasonably high accuracy with respect to which set of data?

The testing set!

The whole dataset is cleaned and then split into training, validation, and testing. Training and validation sets help build and tune the model.

What does increasing area under the ROC curve do?

The Receiver Operator Characteristic (ROC) curve displays the trade-off between the true positive rate and false positive rate at all threshold values. Increasing area under the curve (AUC) pushes the threshold back, improving the performance of the classification model.

For data science, what is the purpose of the Key Performance Indicators (KPIs)?

They measure performance of the business’s use cases. This is all about the Data Science Use Cases topic.

Cognitive and motivational biases are very important parameters. How should they be dealt with in a project?

You want to remove as much bias as possible by de-biasing the data and avoid bias a much as you can whist building the prediction model.