What happens when you ask an AI model to depict different generations? [NOT SPONSORED]
The team at AIport (a newsletter I often contribute to) studied this across 4 models—Stable Diffusion, Midjourney, YandexART, and ERNIE-ViLG.
They analyzed 1200 AI-generated images and provided a comprehensive look at:
How each generation is portrayed in visual terms?
How AI mirrors our societal stereotypes of these individuals?
An (un)expected commonality across all generations is beer🍺! AI consistently showed beer in 34% of the images across all generations.
Read here to find out why along with several key insights: How AI “sees” us?
Let’s get to today’s post now!
Cyclical feature encoding
In typical machine learning datasets, we mostly find features that progress from one value to another:
For instance:
Numerical features like age, income, transaction amount, etc.
Categorical features like t-shirt size, income groups, age groups, etc.
However, there is one more type of feature, which, in most cases, deserves special feature engineering effort but is often overlooked.
These are cyclical features, i.e., features with a recurring pattern (or cycle).
Unlike other features that progress continuously (or have no inherent order), cyclical features exhibit periodic behavior and repeat after a specific interval.
For instance, the hour-of-the-day, the day-of-the-week, and the month-of-an-year are all common examples of cyclical features.
Talking specifically about, say, the hour-of-the-day, its value can range between 0 to 23:
If we DON’T consider this as a cyclical feature and don’t utilize appropriate feature engineering techniques, we will lose some really critical information.
To understand better, consider this:
Realistically speaking, the values “23” and “0” must be close to each other in our “ideal” feature representation of the hour-of-the-day.
Moreover, the distance between “0” and “1” must be the same as the distance between “23” and “0”.
However, standard representation does not fulfill these properties.
Thus, the value “23” is far from “0”. In fact, the distance property isn’t satisfied either.
Now, think about it for a second.
Intuitively speaking, don’t you think this feature deserves special feature engineering, i.e., one that preserves the inherent natural property?
I am sure you do!
Let’s understand how we typically do it.
Cyclical feature encoding
One of the most common techniques to encode such a feature is using trigonometric functions, specifically, sine
and cosine
.
These are helpful because sine
and cosine
are periodic, bounded, and defined for all real values.
Of course, even other trigonometric functions are also periodic, but they are also undefined for some values, like,
tan(pi/2)
.
For instance, consider representing the linear hour-of-the-day feature as a cyclical feature:
The central angle (2π) represents 24 hours.
Thus, the linear feature values can be easily converted into cyclical features as follows:
The benefit of doing this is how neatly the engineered feature satisfies the properties we discussed earlier:
As depicted above, the distance between the cyclical feature representation of “23” and “0” is the same as the distance between “0” and “1”.
The standard linear representation of the hour-of-the-day feature, however, violates this property, which results in loss of information…
…or rather, I should say that the standard linear representation of the hour-of-the-day feature results in an underutilization of information, which the model can benefit from.
Had it been the day-of-the-week instead, the central angle (2π) must have represented 7 days.
This makes intuitive sense as well.
The same idea can be extended to all sorts of cyclical features you may find in your dataset:
Wind direction, if represented categorically, will go in this order: N, NE, E, SE, S, SW, W, NW, and then back to N.
Phases of the moon, like new moon, first quarter, full moon, and last quarter, can be represented as categories with a cyclical order.
Seasons, such as spring, summer, fall, and winter, are categorical features with a cyclical pattern as they repeat annually.
The point is that as you will inspect the dataset features, you will intuitively know which features are cyclical and which are not.
Typically, the model will find it easier to interpret the engineered features and utilize them in modeling the dataset accurately.
👉 Over to you: What are some other ways to handle such features?
For those who want to build a career in DS/ML on core expertise, not trends:
All businesses care about impact.
Can you reduce costs?
Drive revenue?
Can you scale ML models?
Predict trends before they happen?
We have discussed several other topics (with implementations) that align with “business ML.” Here are some of them:
Quantization: Optimize ML Models to Run Them on Tiny Hardware
Conformal Predictions: Build Confidence in Your ML Model's Predictions
5 Must-Know Ways to Test ML Models in Production (Implementation Included)
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning
Model Compression: A Critical Step Towards Efficient Machine Learning
Being able to code is a skill that’s diluting day by day.
Thus, the ability to make decisions, guide strategy, and build solutions that solve real business problems and have a business impact will separate practitioners from experts.
SPONSOR US
Get your product in front of ~90,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.
Fourier and.Fast Fourier Analysis is used in engineering to decode cyclic data of tides and waves for many years.
You need both sin and cos variables to replace a cyclical features because a given value of sin(x) corresponds to 2 values of X. Same thing for cos(X). If you use sin(X) and cos(X), there is only one value of X implied (within [0, 2 pi]).
This way you need only 2 trigonometric signals to replace 23 dummy variables (and not 24) for the hour of the day.