Machine Learning KNearestNeighbors

by: RafaelPiloto10, 7 years ago

Last edited: 7 years ago

I am following the machine learning tutorial and I am having some trouble. It is mainly due to the fact that cross_validation is no longer available. Here is my code below:

import numpy as np
from sklearn import neighbors
from sklearn.model_selection import cross_validate
import pandas as pd

df = pd.read_csv('breast-cancer-wisconsin.data.txt')
df.replace('?',-99999, inplace=True)
#df.drop(['id'], 1, inplace=True)

X = np.array(df.drop(['class'], 1))
y = np.array(df['class'])
clf = neighbors.KNeighborsClassifier()

X_test, y_test, X_train, y_train = cross_validate(clf, X, y)
# Error, line 16 is here:
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(accuracy)

# Error I am getting -
Traceback (most recent call last):
  File "Enter file path here", line 16, in <module>
    clf.fit(X_train, y_train)
  File "Enter sklearn base.py path here", line 765, in fit
    X, y = check_X_y(X, y, "csr", multi_output=True)
  File "Enter sklearn validation.py path here", line 545, in check_X_y
    dtype=None)
  File "Enter sklearn validation.py path here", line 426, in check_array
    n_samples = _num_samples(array)
  File "Enter sklearn validation.py path here", line 118, in _num_samples
    " a valid collection." % x)
TypeError: Singleton array array('train_score',
      dtype='<U11') cannot be considered a valid collection.

I am a beginner and I tried looking for an answer everywhere but I haven't been able to find one or completely understand why. If someone could explain to me what is wrong or how to fix it I would greatly appreciate it, Thank you.

Original tutorial link: https://pythonprogramming.net/k-nearest-neighbors-application-machine-learning-tutorial/



You must be logged in to post. Please login or register an account.



I'm a newbie too but could it be something to do with the shape/length of the data that is being pulled in?

-nickduddy 7 years ago

You must be logged in to post. Please login or register an account.


It could be, I noticed in his other tutorials he put

variable = np.array(array, dtype = np.float64)

and in the error its mentions the dtype. I am not sure if this has anything to do with the error but I tried playing around with this idea of stating the dtype as a float64 but only landed into more errors.

-RafaelPiloto10 7 years ago

You must be logged in to post. Please login or register an account.


You could try this tutorial https://cambridgespark.com/content/tutorials/implementing-your-own-knearest-neighbour-algorithm-using-python/index.html

-nickduddy 7 years ago

You must be logged in to post. Please login or register an account.


Thanks for the recommendation. I am a teen in high school (sophomore) so I have yet to take all of the advance classes which could help with machine learning. This sort of blurs out the machine learning process and leaves big gaps where I just have to assume that what I am learning is good for what I am trying to do. I am blindly using my resources and hope they work in the end which I guess is just a part of programming. This is mainly why I watch Sentdex's tutorials because they are explained well and I can learn them visually with the videos which makes the learning process so much easier. I was hoping if someone could explain why I am getting my error and how to fix it so I can further continue the tutorial and get a better understanding of Machine Learning. Of course I could continue without doing the code but I feel I learn better when I can actually visualize it myself and learn what works and what doesn't as well as playing with the numbers and seeing what changes.

-RafaelPiloto10 7 years ago

You must be logged in to post. Please login or register an account.