Discriminant Analysis in SPSS Explained -

Some of the most useful questions in research are classification questions. Which customers are likely to stay loyal and which are likely to leave? Which respondents belong to a high-value segment? What separates one group from another? Discriminant analysis in SPSS is built to answer exactly these kinds of questions.

It is a powerful statistical technique that classifies data into distinct groups based on a set of predictor variables. For market research teams, it turns survey and behavioural data into clear group profiles – and helps predict which group a new respondent belongs to.

This guide explains what discriminant analysis is, when to use it, how to run it in SPSS step by step, and how to read the output. We keep it practical, so you can apply it to real research projects.

What Is Discriminant Analysis?

Discriminant analysis is a classification technique that identifies which variables best separate two or more predefined groups. It then uses those variables to classify new cases into the correct group.

In simple terms, it answers two questions:

Which variables best distinguish between groups?
Which group does a new case most likely belong to?

It is widely used across marketing, finance, psychology, and healthcare, where understanding group membership is essential for decision-making. For example, a bank might use it to separate low-risk from high-risk loan applicants based on income, debt, and employment history.

Discriminant analysis works with a categorical dependent variable (the groups) and continuous independent variables (the predictors). This is what sets it apart from many other techniques.

When Should You Use Discriminant Analysis?

Discriminant analysis fits a specific type of research question. It is the right choice when:

Your outcome variable is categorical (two or more defined groups)
Your predictor variables are continuous or numeric
You want to know which variables drive group differences
You want to classify new observations into existing groups

Common research scenarios include:

Segmenting customers into loyalty tiers
Predicting whether a customer will churn or stay
Classifying respondents by buying behaviour
Identifying which factors separate satisfied from dissatisfied customers

If your goal is classification and your predictors are numeric, discriminant analysis is often the ideal method.

Discriminant Analysis vs Related Techniques

It helps to understand where discriminant analysis sits among similar methods. A quick comparison:

Discriminant analysis – predicts a categorical group from continuous predictors
Logistic regression – also predicts group membership, but with fewer assumptions
Cluster analysis – finds groups when you do not already know them
ANOVA – tests differences between groups but does not classify cases

The key distinction is this: cluster analysis discovers groups, while discriminant analysis works with groups you have already defined. Choose based on whether your groups are known in advance.

Key Assumptions to Check First

Discriminant analysis relies on several assumptions. Checking them protects the reliability of your results – and reflects good data quality practice. The main ones are:

Group membership is known – your categories are clearly defined
Predictors are continuous – and ideally normally distributed
Equality of covariance matrices – tested using Box’s M test
No strong multicollinearity – predictors should not be too highly correlated
Adequate sample size – enough cases in each group for stable results

Box’s M test is built into SPSS for this purpose. If assumptions are seriously violated, logistic regression may be a more robust alternative. Checking first saves you from drawing conclusions on a shaky foundation.

How to Run Discriminant Analysis in SPSS: Step by Step

Here is the practical workflow inside SPSS. Follow these steps in order.

Step 1: Open the Analysis Menu

Go to Analyze → Classify → Discriminant. This opens the main discriminant analysis dialogue box where you set up the model.

Step 2: Define Your Grouping Variable

Move your categorical variable (the groups) into the Grouping Variable box. Then click Define Range and enter the lowest and highest codes for your groups. For example, if you have two groups coded 1 and 2, enter 1 and 2.

Step 3: Add Your Predictors

Move your continuous predictor variables into the Independents box. At this point you choose how the model uses them:

Enter independents together – uses all predictors at once
Use stepwise method – adds predictors one at a time based on their contribution

Use the stepwise method when you want SPSS to identify the most useful predictors automatically.

Step 4: Select Your Statistics

Click the Statistics button and select the outputs you need. Useful selections include:

Means and Univariate ANOVAs – to compare groups on each variable
Box’s M – to test the equality of covariance assumption
Unstandardised coefficients – for the discriminant function equation

Step 5: Set Up Classification

Click the Classify button and choose your classification options. Common selections:

Leave-one-out cross-validation – tests how well the model classifies
Within-groups covariance matrix
Plots – to visualise how groups separate

Step 6: Save and Run

Click the Save button to store predicted group membership and discriminant scores for each case. Then click OK to run the analysis. SPSS generates a full set of output tables.

How to Read the SPSS Output

The output can look dense at first, but a few key tables tell you most of what you need. Here is how to interpret them.

Tests of Equality of Group Means

This table shows whether each predictor differs significantly across groups. A significant result (p < 0.05) means that variable helps separate the groups. It is an early signal of which predictors matter.

Box’s M Test

This tests the assumption that covariance matrices are equal across groups. If the result is significant, the assumption is violated, and you may need to use separate covariance matrices in classification.

Eigenvalues and Canonical Correlation

The eigenvalue shows the discriminating power of each function – larger is better. The canonical correlation measures the strength of association between the discriminant scores and group membership. Values closer to 1 indicate a stronger relationship.

Wilks’ Lambda

This is one of the most important values. Wilks’ Lambda tests whether the discriminant function is statistically significant. The key rule is counterintuitive: a lower Wilks’ Lambda means better discrimination. If the associated p-value is below 0.05, your model separates the groups better than chance.

Standardised Canonical Discriminant Function Coefficients

These show how much each predictor contributes to the function. Larger absolute values mean a stronger contribution. This table tells you which variables are doing the heavy lifting in separating the groups.

Classification Matrix

This cross-tabulates actual versus predicted group membership. For a good model, the values along the diagonal should be high – these are correctly classified cases. The overall percentage correctly classified is a clear measure of how well the model performs.

A Simple Worked Example

Imagine a market research project classifying customers into two groups: likely to churn and likely to stay. The predictors are monthly spend, tenure, and support tickets raised.

Running discriminant analysis in SPSS might show:

Tests of equality reveal all three predictors differ significantly between groups
Wilks’ Lambda is low with p < 0.05, so the function is significant
Standardised coefficients show tenure contributes most, then spend
The classification matrix shows 85% of customers correctly classified

The business takeaway is clear: tenure and spend are the strongest signals of churn risk, and the model can reliably flag at-risk customers. That is how a statistical output becomes an actionable insight.

Common Mistakes to Avoid

Even with SPSS handling the maths, a few errors trip up many analysts:

Skipping assumption checks – running the analysis without testing covariance equality or multicollinearity
Misreading Wilks’ Lambda – remembering that lower is better, not higher
Ignoring cross-validation – a model can fit existing data well but classify new cases poorly
Using too few cases per group – small samples produce unstable results
Treating ordinal data as continuous – predictors should genuinely be continuous

Avoiding these keeps your analysis sound and your conclusions trustworthy.

Industry Applications

Discriminant analysis adds value across many sectors that rely on classification:

Marketing: segmenting customers and predicting brand switching
Finance: separating low-risk from high-risk applicants
Healthcare: classifying patients by risk profile
Consumer research: grouping respondents by attitudes or behaviour
Retail: identifying which factors separate loyal from occasional shoppers

In each case, the goal is the same: use measurable variables to classify groups and support confident, evidence-based decisions.

How Linkinfotech Supports SPSS-Based Research

Discriminant analysis is only as reliable as the data behind it. Clean, well-structured input is what separates a trustworthy model from a misleading one. Linkinfotech operates as a global research operations and technology partner, supporting market research firms and enterprise teams across the full analysis pipeline.

Our role spans the stages that make robust analysis possible:

Clean data collection – structured, multi-mode collection across web, phone, and field
Data processing and validation – analysis-ready datasets you can trust
Statistical analysis support – including techniques like discriminant analysis and segmentation
Real-time dashboards – clear visibility into results and classifications
Secure, scalable operations – ISO-certified processes and compliant data handling

Because we manage the foundation – clean, secure, well-structured data – the statistical analysis built on top of it becomes far more dependable. That is what turns SPSS output into insights you can confidently act on.

Final Thoughts

Discriminant analysis in SPSS is a practical, powerful way to answer classification questions. It tells you which variables separate your groups and predicts which group a new case belongs to. By following the steps – defining groups, adding predictors, checking assumptions, and reading the output carefully – you can turn raw data into clear, classified insights.

As with any statistical method, the quality of the result depends on the quality of the data. Clean, structured, well-collected data is what makes discriminant analysis dependable.

If you want reliable statistical analysis backed by clean data and secure operations, Linkinfotech can help you build research processes that are accurate, scalable, and ready for confident decision-making.

Frequently Asked Questions

What is discriminant analysis used for in SPSS?

It is used to classify cases into predefined groups based on continuous predictor variables, and to identify which variables best separate those groups. It is common in customer segmentation, churn prediction, and risk classification.

What is Wilks’ Lambda in discriminant analysis?

Wilks’ Lambda tests whether the discriminant function is statistically significant. A lower value indicates better discrimination between groups. If its p-value is below 0.05, the model separates groups better than chance.

What is the difference between discriminant analysis and cluster analysis?

Cluster analysis discovers groups that are not known in advance. Discriminant analysis works with groups that are already defined and focuses on classifying cases into them. Choose based on whether your groups are known.

What are the main assumptions of discriminant analysis?

The key assumptions are known group membership, continuous and ideally normally distributed predictors, equality of covariance matrices (tested with Box’s M), no strong multicollinearity, and an adequate sample size per group.

How do I know if my discriminant model is good?

Check Wilks’ Lambda for significance, review the canonical correlation, and examine the classification matrix. A high percentage of correctly classified cases – especially under cross-validation – indicates a strong model.

When should I use logistic regression instead?

If the assumptions of discriminant analysis are seriously violated – particularly the covariance and normality assumptions – logistic regression is often a more robust alternative for predicting group membership.