What is Confusion Matrix?
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one
Have you been in a situation where you expected your machine learning model to perform really well but it sputtered out a poor accuracy? You’ve done all the hard work — so where did the classification model go wrong? How can you correct this?
There are plenty of ways to gauge the performance of your classification model but none have stood the test of time like the confusion matrix. It helps us evaluate how our model performed, where it went wrong and offers us guidance to correct our path.
Let’s now define the most basic terms, which are whole numbers (not rates):
- true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.
- true negatives (TN): We predicted no, and they don’t have the disease.
- false positives (FP): We predicted yes, but they don’t actually have the disease. (Also known as a “Type I error.”)
- false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a “Type II error.”)
Before we answer this question, let’s think about a hypothetical classification problem.
Let’s say you want to predict how many people are infected with a contagious virus in times before they show the symptoms, and isolate them from the healthy population (ringing any bells, yet?). The two values for our target variable would be: Sick and Not Sick.
Now, you must be wondering — why do we need a confusion matrix when we have our all-weather friend — Accuracy? Well, let’s see where accuracy falters.
Our dataset is an example of an imbalanced dataset. There are 947 data points for the negative class and 3 data points for the positive class. This is how we’ll calculate the accuracy:
Let’s see how our model performed:
The total outcome values are: TP = 30, TN = 930, FP = 30, FN = 10
So, the accuracy for our model turns out to be:
96%! Not bad!
But it is giving the wrong idea about the result. Think about it.
Our model is saying “I can predict sick people 96% of the time”. However, it is doing the opposite. It is predicting the people who will not get sick with 96% accuracy while the sick are spreading the virus!
Do you think this is a correct metric for our model given the seriousness of the issue? Shouldn’t we be measuring how many positive cases we can predict correctly to arrest the spread of the contagious virus? Or maybe, out of the correctly predicted cases, how many are positive cases to check the reliability of our model?
This is where we come across the dual concept of Precision and Recall.
How is ML being Used in terms of Security?
How it’s using machine learning: Microsoft uses its own cybersecurity platform, Windows Defender Advanced Threat Protection (ATP), for preventative protection, breach detection, automated investigation and response. Windows Defender ATP IS built into Windows 10 devices, automatically updates and employs cloud AI and multiple levels of machine learning algorithms to spot threats.
How it’s using machine learning: Chronicle is a cybersecurity company that sprang from Google’s parent company Alphabet. Its first product, Backstory, has been described as “designed for a world where companies generate massive amounts of security telemetry and struggle to hire enough trained analysts to make sense of it.” Backstory analyzes large amounts of security data (such as internal network activity, known bad domains and suspected malware) and uses machine learning to condense it into more easily digestible insights.
Improved Support Vector Machine for Cyber Attack Detection
These images represents an efficient and scalable algorithm for classification of cyber attack. The performance of traditional SVM is enhanced in this work by modifying Gaussian kernel to enlarge the spatial resolution around the margin by a conformal mapping, so that the separability between attack classes is increased. It is based on the Riemannian geometrical structure induced by the kernel function. It improved Support Vector Machine (iSVM) algorithm for classification of cyber attack dataset. Result shows that iSVM gives 100% detection accuracy for Normal and Denial of Service (DOS) classes and comparable to false alarm rate, training, and testing times