What does Machine Learning see?

Machine learning algorithms can understand problems with hundreds or sometimes thousands of dimensions, thus seeing things that the human eye could not otherwise see. But how do these methods compare when the human eye can actually see?

That is why we generated a series of experiments in 2 dimensions and applied several different machine learning methods to compare the real data vs. what the computer saw.

Basically, the challenge is to take a pattern that is easy for the human eye and generalize it as follows:

You are given a dataset like this:

And it is expected to generalize like this:

The algorithms applied are:

rpart: Decision Tree using rpart, default parameters.
logit: Logistic Regression using glm.
forest_h2o: Random Forest using H2O, 5000 trees.
forest_ranger: Random Forest, 5000 trees.
knn50: KNN with 50 samples.
nnet50: using H2O, one-layer neural network with 50 nodes and logit kernel.
deep10x5: Multi-layer Neural Network using H2O, 5 layers of 10 neurons each.

Then metrics such as accuracy and AUC will be computed on both datasets.

Now, hoping you have understood the challenge, let’s start with the different experiments:

Diagonal

Chessboard

Cross

Chessboard 2

Harley Queen

Circle

Circles

Conclusions

I think the most interesting thing about having done this is that an AUC of 0.7 is considered good in a propensity model, but in reality it is not a good model. It is enough to compare what the neural network saw in the "Harley Queen" case:

Which makes me think that Machine Learning still has a lot to "learn".

On the other hand, the beloved neural networks were not always the best at predicting.

Cheers.

As always, I leave you the code for the exercise: comparacion_ml