search
Search
Publish
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
editRecent Publications
scheduleUpcoming Publications
Eigenvalues and Eigenvectors
Gradient Descent
K-means Clustering
Linear Regression
DBSCAN
Expectation Maximization Algorithm
Want to be notified?
Join our Discord
or register an account!
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
Twitter
Facebook
editRecent Publications
scheduleUpcoming Publications
Eigenvalues and Eigenvectors
Gradient Descent
K-means Clustering
Linear Regression
DBSCAN
Expectation Maximization Algorithm
Want to be notified?
Join our Discord
or register an account!
thumb_up
7
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Guides on Machine Learning

Machine Learning
schedule May 16, 2022
Last updated
local_offer Machine Learning
Tags

What is this series about?

Welcome to our comprehensive machine learning series 👋! The goal of this series is to give you a deeper and intuitive understanding of important concepts in machine learning. Our guides are filled with simple but insightful examples, which is a style of teaching we strongly believe in.

We aim to publish a new guide per week, and we also routinely improve our existing guides with more examples and sections. Feel free to register an accountopen_in_new to be notified when we do!

Latest updates

Here are the updates in the last 14 days:

Comprehensive guides

Machine learning models

  • Naive Bayes

    Naive Bayes is a simple but powerful model based on the Bayes' theorem. It is often used for classification tasks in the area of natural language processing.

  • Decision Trees

    Decision trees are a suite of tree based-models used for classification and regression. They are one of the most commonly used models in data science due to their highly robust and transparent predictions.

  • Random Forest

    Random forest is a machine learning model that involves building multiple decision trees in a random manner to perform classification or regression.

  • Linear Regression

    The objective of linear regression is to draw a line of best fit that can then be used for predictions and inferences.

Feature engineering

  • Feature Scaling

    Feature scaling is an important preprocessing step in machine learning that can help increase accuracy and training speed.

  • Grid Search

    Grid search is a brute-force technique to find the optimal hyper-parameters for model building.

  • Text Vectorization

    Machine learning models require numerical input and so we need to transform non-numeric data (e.g. text and categories) into vectors. This step is called text vectorization.

  • Principal Component Analysis (PCA)

    Principal component analysis, or PCA, is one of a family of techniques for dimensionality reductions that uses the dependencies between the features to represent them in a lower dimensional form while trying to minimize information loss.

Evaluating machine learning models

  • Confusion Matrix

    A confusion matrix is a simple table used to summarise the performance of a classification algorithm.

  • ROC curves

    The ROC (Receiver Operating Characteristic) curve is a way to visualise the performance of a binary classifier.

  • Cross Validation

    Cross validation is a technique to measure the performance of a model through resampling.

  • Mean Squared Error (MSE)

    The mean squared error, or MSE, is a performance metric that measures how well your model fits the target values. The mean squared error is defined as the average of all squared differences between the true and predicted values.

  • Mean Absolute Error (MAE)

    Mean absolute error, or MAE, measures the performance of a model, and is defined as the average of all the absolute differences between true and predicted values.

  • Root Mean Squared Error (RMSE)

    The root mean squared error (RMSE) is defined as the square root of the average squared differences between the actual and predicted values.

PySpark

  • Getting Started with PySpark

    PySpark is an API interface that allows you to write Python code to interact with Apache Spark, which is an open source distributing computing framework to handle big data.

  • Resilient Distributed Data (RDD)

    RDD is the central data structure of Spark in which the data is partitioned across a number of worker nodes to facilitate parallel operations.

  • Getting Started with PySpark on Databricks

    Databricks offer a platform to gain some hands-on experience with PySpark for free using the community edition.

Reach out to us

Please feel free to hop onto our Discord if you:

  • have any questions about our guides

  • have any requests on machine learning topics

  • are passionate about data science and want to chill with like-minded people

We'll get back to you as soon as possible 🙂.

robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...