Dimensionality Reduction: Signal vs. Noise in Machine Learning

We’ve all been there: you get a new dataset, and your first instinct is to understand what’s happening beneath the surface. Which features matter most? What relationships exist in the data? How can we visualise complex high-dimensional information? How do we use dimensionality reduction when using ML!

Image representing PCA's working visually

This was precisely the situation described in a fascinating Reddit thread where a data scientist proposed using dimensionality reduction techniques (t-SNE, PCA, UMAP) for exploratory analysis, only to be immediately shut down with the argument that “reducing dimensions means losing information.”

While technically true, this dismissal misses the nuanced reality of modern data science. Let’s dive deeper into the community’s collective wisdom on when and why dimensionality reduction makes sense.

The Information Paradox

As one insightful commenter put it: “You do not need all the information, and it is quite possible some ‘information’ is just noise, which can be reduced via dimensionality reduction.”

This captures a fundamental truth in machine learning: more data doesn’t always mean better results. In fact, reducing dimensions often means:

Removing redundant or highly correlated features
Eliminating noise that could lead to overfitting
Making patterns more discoverable
Creating more interpretable models
Avoiding the infamous “curse of dimensionality”

As another commentator wisely noted, “Extra input dimensions means extra model complexity, which means more prone to overfitting/more data needed. You only use the information you need.”

Feature Selection vs. Dimensionality Reduction

Several commenters made an important distinction between feature selection (keeping some original features and discarding others) and dimensionality reduction (projecting high-dimensional data onto a lower-dimensional space).

One user explained: “Most people use feature selection to mean keeping some features and throwing away others, while dimension reduction means projecting high-dimensional data onto low-dimensional space.”

Another added: “If the data is traditional tabular type data where features have clear, intuitive meaning, then dimensionality reduction destroys some of that whereas dropping useless features does not.”

This distinction matters particularly for explainability. If your stakeholders need to understand exactly which real-world factors influence predictions, maintaining the original feature space may be preferable.

When To Use Dimensionality Reduction

The thread revealed several compelling use cases:

Exploratory Data Analysis: “With new datasets I will often do a quick UMAP with some colouring (labels) to get a sense of how hard/easy the problem is. If you can visually see patterns based on your labels, you are likely to be able to get great performance on basic ML metrics.”
Highly Correlated Features: “You usually use dimensionality reduction techniques (mainly PCA) when you have a large number of features (>100) and your features are highly correlated, which can create collinearity problems.”
High-Dimensional Industrial Settings: One practitioner shared, “I work in industry looking at data from factories with about 2000 different pumps, pipes, valves, and motors, all of which are somewhat collinear and interconnected in some way. I have to do some kind of dimensionality reduction to model one or more target features to find some kind of optimization.”
Performance Optimization: “Reducing dimensions also leads to faster predictions and a reduction in computational resources.”

Modern ML Already Uses Dimensionality Reduction

One particularly enlightening comment pointed out that many state-of-the-art machine learning methods already incorporate dimensionality reduction:

“Modern methods pretty much all apply dimensionality reductions… Autoencoders, VAE, UNet, CNNs, transformers (LLMs). Here are some examples:

ResNet-50 takes 224×224 input and its penultimate layer node is 2048. It is dimensionality reduction from 50,176 to 2,048.
LLaMA 3’s vocab size is 128,256. Its embedding dimension is 4,096. You are essentially reprojecting each input token, one-hot encoded 128,256-dimensional vector, onto a 4,096-dimensional vector space.”

This insight reveals how dimensionality reduction is actually built into the architecture of many advanced models.

A Toolbox, Not a Doctrine

The thread ultimately highlighted that dimensionality reduction is simply one tool among many. As one commenter succinctly advised: “If in doubt, experiment with it, compare your results, and write them down.”

Several experts recommended practical approaches:

“Run a random forest classifier, then ask it what important features influenced its splits.”
“Compute the correlation matrix and check if your features are closely correlated.”
“For feature selection, I would prefer BORUTA and look at Shapley values.”
“Look at causal modeling to determine true relationships versus mere correlations.”

The Visualisation Caveat

Several commenters cautioned about overreliance on certain techniques:

“t-SNE is only nice visually, and it never took off as a relevant scientific method. It’s a toy tool. It’s quite arbitrary what latent space it builds every time you re-run it.”

Another added: “t-SNE and UMAP won’t tell you about ‘relationships’ since they create completely new nonlinearly related features, tangling them incomprehensibly together.”

These perspectives remind us that visualization tools should be used thoughtfully, especially when communicating results to stakeholders.

Conclusion: Context Matters

The debate about dimensionality reduction ultimately comes down to understanding your specific context:

What is your goal? Predictive accuracy, speed, interpretability, or visualization?
What kind of data are you working with? Tabular, image, text, or time series?
How many dimensions do you have relative to your sample size?
What is the signal-to-noise ratio in your features?
Who needs to understand your results, and at what level of detail?

The thread’s collective wisdom suggests that a blanket rejection of dimensionality reduction techniques is misguided. Instead, the right approach depends on your specific needs, constraints, and objectives.

As one commenter wisely suggested, the initial response to the dimensionality reduction proposal should have been: “Depending on your case, it could take literally 10 minutes to write up a script. Just try it. This is like one of the funnest and easiest things to do on a new dataset.”

In data science, as in many disciplines, learning often comes through experimentation rather than rigid adherence to rules. Sometimes, the best response to theoretical objections is simply: “Let’s see what the data tells us.”

Simple AI

AI, Data & Problem Solving