Blown by Style - Fashion MNIST
What if you wished to purchase some similar apparel you saw on a random person but were not sure where to find it? I can help you.
This data science project lets one take the picture of the apparel randomly to classify it and further suggest similar-looking items from the data. Such a project has many applications in online retail websites, E-commerce, etc which will definitely boost sales of retail.
Tools Used: Python, Jupyter Notebook
Libraries Used: Tensorflow, SkLearn, Numpy, Pandas, Matplotlib
Models: Logistic Regression, Multinomial Naive Bayes, K-Nearest Neighbors, Support Vector Machine, Decision Tress, Random Forest, Multi-Layer Perceptron, and K-Means.
The primary task of this project is to classify the uploaded picture with the respective class. Hence for this project, the data used is Fashion MNIST where 60,000 samples belong to training data and 10,000 samples belong to testing data. The samples are classified into 10 categories namely: T-shirt/Top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle Boot. Since the images are already grayscale, the images are flattened and scaled using Numpy for ease of training.
​
Various machine Learning models are trained to understand their performance on training data using accuracy and precision/recall metrics. The best model is selected to further test its performance on real-life testing data. Finally, the desired image is fed to the model and it classifies it before suggesting similar apparel.
Real life fashion apparel with their actual names and predicted class
Hence, the secondary task of this project is to find similar apparel from the data to the given test sample. In order to achieve this, the Kmeans model is used to create clusters of similar-looking items for that respective class and further train the samples with the cluster labels and the respective class to find the top 5 nearest samples to that test sample.
Row-wise Clusters of Sandal test sample
Results from the analysis suggest that SVM is the best model for the classification model and was used for testing real-life images. The model gave an accuracy of 70% for 10 different test samples. For the further task, a test sample with the correct predicted class was selected to form 10 clusters from its predicted class training data. Further, the selected data was trained with the cluster labels using the K-Nearest Neighbors model and the top 5 samples which resemble close to the test sample(sandals in this case) were recommended.