Why is the classify function not giving the desired output?

Greetings,

I have created 4 sample data and classified them to 4 classes as shown. However, when I want to test the ClassifierFunction, the output is not as expected. I have realized that for some reason, it neglects the inner two data sets. When tested with only 2 data sets, it works just fine. Why, when I give create a sample data to test, which is almost exactly as the training data, does it give me a different output than expected? For example in the image, the output should give me “Compound 3”, but gives me “Compound 4” instead. The image explains everything.

I appreciate your help,
Thank you

data1 = Table[Sin[x] + 10 + RandomReal[{0.5, -0.5}], {x, 0, 2 π, 0.1}];
data2 = Table[Sin[x] + 50 + RandomReal[{0.5, -0.5}], {x, 0, 2 π, 0.1}];
data3 = Table[Sin[x] + 100 + RandomReal[{0.5, -0.5}], {x, 0, 2 π, 0.1}];
data4 = Table[Sin[x] + 150 + RandomReal[{0.5, -0.5}], {x, 0, 2 π, 0.1}];

trainingset = {data1 -> “Compound 1”, data2 -> “Compound 2”, data3 -> “Compound 3”, data4 -> “Compound 4”};
c = Classify[trainingset]

data = Table[Sin[x] + 100 + RandomReal[{0.5, -0.5}], {x, 0, 2 π, 0.1}];
c[data]

=================

  

 

Please, post copyable code not images. And format it properly — more on that in the help centre.
– Sektor
Aug 22 at 12:35

=================

3 Answers
3

=================

I assume what you want to do is classify individual values (such as for instance 12.232) into one of four classes and that your data1 to data4 are training examples for each individual class. In this case the syntax for Classify is slightly different and this is how you need to specify the training data:

Classify[<|class1->{example11,example12,}, class2->{example21, example22}|>]

You can convert your training data via trainingset // Map@Reverse // Association and accordingly train your classifier

c2 = Classify[trainingset // Map@Reverse // Association]

and use it to classify values

c2[12.232] (*yields Compound 1*)

Unlike @Sascha, I assume that you are trying to do exactly what your code says: namely, classify data into four classes using one example of a length 63 feature vector for each.

This is challenging for some types of classifier, in particular the default (Method->”LogisticRegression”), which (I believe) seeks to separate the classes using a linear combination of features.

Changing to a different Method e.g.

c = Classify[trainingset, Method -> “NearestNeighbors”]

will give an answer that I think you will find more reasonable, but the real solution is to provide more examples for each class in the training set.

This is, perhaps, an extended comment. Mathematica comes with a lot of built-in classifiers (e.g. language, face detection etc). I think it is unreasonable to expect presenting any data form to classify. Mathematica is certainly’smart’ in trying to match input with classifier. However, data pre-processing happens whether we realize it or not. This extracts features/predictors that can be used for the particular model chosen. I suggest caveat emptor. This is not a criticism of Mathematica (or MatLab or Python or whatever tools you use) but an expression of the need to look at, understand and perhaps pre-process data (as well as understanding [to some degree] what the algorithms are doing…I am NO EXPERT).

The following is completely artificial and in fact I use data as a test object just because the plots in OP are “so distinct” prompting the concernregarding Classify

Using the information from OP:

td = {data1, data2, data3, data4};
lab = {“Compound 1”, “Compound 2”, “Compound 3”, “Compound 4”};
mean = Thread[Mean /@ td -> lab];
var = Thread[Variance /@ td -> lab];
median = Thread[Median /@ td -> lab];
ct = {“LogisticRegression”, “SupportVectorMachine”,
“NearestNeighbors”};
tup = Tuples[{{mean, var, median}, ct}];
cfs = Classify[#1, Method -> #2] & @@@ tup;

Three features are extracted (mean, variance, median of data) and three classifiers.

Some classifier measurements to illustrate (how somewhat obviouslysome measure of central tendency discriminates).The data variable is used in a contrived manner.

func[prop_] :=
Grid[Partition[
MapThread[
Column[{Framed@Row[#2],
ClassifierMeasurements[#1, Thread[data -> lab[[3]]], prop]},
Alignment -> Center] &, {cfs,
Tuples[{{“Mean: “, “Variance: “, “Median: “}, ct}]}], 3],
Frame -> All]

e.g.

func[“Accuracy”]
func[“ConfusionMatrixPlot”]

There are a lot of new features in Mma 11. I hope we our human learning goes pari passu with the machine learning tools so we can have value as well as fun. (end of philosophical rant)