Mathematics sits at the core of statistics and data science. It offers the essential framework for understanding data.
Thus, understanding the fundamental mathematical formulations in data science is highly important.
This visual depicts some of the most important equations (in no specific order) used in Data Science and Statistics.
Gradient Descent: An optimization algorithm used to minimize the cost function. It helps us find the optimal parameters for ML models.
Normal Distribution: A probability distribution that forms a bell curve and is often used to model and analyze data in statistics.
Sigmoid: A function that maps input values to a range between
0
and1
. It is commonly used in logistic regression to make predictions.Linear Regression: A statistical model used to model a linear relationship between independent and dependent variables.
Cosine Similarity: A measure that calculates the cosine of the angle between two vectors. It is typically used to determine the similarity between data points.
Naive Bayes: A probabilistic classifier based on the Bayes theorem. It assumes independence between features and is often used in classification tasks.
KMeans: The most popular clustering algorithm that is used to partition data points into distinct groups.
Log Loss: A loss function used to evaluate the performance of classification models using output probabilities.
MSE (Mean Squared Error): A metric that measures the average squared difference between predicted and actual values. It is commonly used to assess regression models.
MSE + L2 Regularization: An extension of MSE that includes L2 regularization. It is used to prevent overfitting.
Entropy: A measure of the uncertainty or randomness of a random variable. It is often utilized in decision trees.
Softmax: A function that normalizes a set of values into probabilities. It is commonly used in multiclass classification problems.
Ordinary Least Squares: A method for estimating the parameters in linear regression models by minimizing the sum of squared residuals.
Correlation: A statistical measure that quantifies the strength and direction of the linear relationship between two variables.
Z-score: A standardized value that indicates how many standard deviations a data point is from the mean.
MLE (Maximum Likelihood Estimation): A method for estimating the parameters of a statistical model by maximizing the likelihood of the observed data.
Eigen Vectors: The non-zero vectors that do not change their direction when a linear transformation is applied. It is widely used in dimensionality reduction techniques.
R2 (R-squared): A statistical measure that represents the proportion of variance explained by a regression model, indicating its predictive power.
F1 Score: A metric that combines precision and recall to evaluate the performance of binary classification models.
Expected Value: The weighted average value of a random variable, calculated by multiplying each possible outcome by its probability.
👉 Over to you: Of course, this is not an all-encompassing list. What other equations will you include here?
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.