USK-COFFEE DATASET: A multi-class dataset composed of the various green bean arabica

Coffee is one of the plantation commodities that plays a big role in the world economy. According to the classification of coffee, each type of coffee has various shapes and textures. Traditional human visual sorting of coffee beans is time-consuming, labor-intensive and may result in low-quality coffee due to worker stress and exhaustion. The purpose of this study was to offer a lightweight and understandable intelligent coffee bean sort accurately system that uses deep learning (DL) to assist farmers in sorting green bean arabica by variety. As a result, we successfully divided the 4 categories of coffee beans. ResNet-18 performs best for sorting coffee beans classification with 98.81% training accuracy, 81,31% testing accuracy, 84.14% precision, and 81.125% recall.

We also introduce a novel dataset called USK-Coffee derived from a coffee bean collection that includes 4 classes: peaberry, longberry, defect, and premium. Our purpose is to overcome the unavailability of publicly available variety green beans data for green coffee bean research. This image data will be collected manually by taking pictures through a digital camera and collecting 8,000 images where each class is divided into 2,000 coffe bean images. This dataset can be used for two tasks. First, classification and detection consider peaberry, longberry, and coffee premium in one group and all defect coffee in another group, so it's divided into two classes, normal coffee bean and defect coffee bean. Second, for recognizing each of the 4 types of coffee beans. Our experimental results show that our Deep Learning method for classification and sorting achieves significant improvement on 4 types of coffee beans performance. Based on this, we believe that our dataset is very challenging and opens more opportunities for future work.

Problem & Motivation

According to a field survey conducted at KNT Coffee in Banda Aceh, Indonesia, the coffee bean sorting technique is carried out by long-term understanding humans and still done manually. This human visual inspection sorting istime-consuming, labor-intensive, and it vulnerable to inconsistency, mistakes in evaluation by a different human and potentially leads to mis-sorting as a result of stress and worker exhaustion. Thus, labor hours increase, affecting the quality of the coffee beans and decrease the efficiency of the sorting process. We present a method for determining and classifying green bean arabica by variety peaberry, longberry, premium, and defect using deep learning technique. We use image processing technology to categorize classes of green coffee beans using a convolutional neural network (CNN) with ResNet-18 and MobileNetV2 architectures, which are common deep learning technologies. CNN specializes in identifying color and shape characteristics from images. This method improves the accuracy of the coffee bean sorting process, the efficiency of sorting time, and the production of arabica coffee beans are increased.


The model was built and modified using the ResNet-18 architecture in the first experiment. When compared to other architectural models, the ResNet model has the advantage of maintaining performance even as the architecture becomes more complex, compute calculations become lighter, and the ability to train networks improves. Then, we modified this architecture. The input image on the model is an RGB coffee bean image with dimensions of 3 x 256 x 256 pixels and we changed the final layer of the ResNet-18 model, which is the fully connected layers, and changed it with 4 fully connected layers. MobileNetV2 architecture is based on depthwise separable convolution layers rather than traditional 2D convolution layers to boost efficiency. We freeze multiple layers of the neural network in this experiment to speed up neural network training. The first 5 layers are frozen, while the neural network layer is kept open (unfrozen). As a result, the model's performance has improved significantly. We also make changes to the fully connected layer in order to divide the dataset into 4 classes

Do you want to try our dataset?

You can use our dataset and make experiment with it!

browse here


USK-Coffee is a one of public datasets that is open for everyone. For detailed information about the dataset, please see the tchnical report linked below.

  • Number of Categories: 4 categories
  • Number of Images: 8,000 images

Categories Total Dimensions
Peaberry 2,000 256 x 256 pixels
Longberry 2,000 256 x 256 pixels
Premium 2,000 256 x 256 pixels
Defect 2,000 256 x 256 pixels

If you use USK-Coffee Dataset in your work, please cite the technical report:
Febriana, A., Muchtar, K., Dawood, R., Lin, CY.,(2022, June). "USK-COFFEE Dataset: A Multi-class Green Arabica Coffee Bean Dataset for Deep Learning​." In 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom). IEEE. (pdf)


You can download the dataset and code using the link below, before download please read README:

Data and Code

Click Here


Click Here


You can contact Alifya Febriana via LinkedIn or E-mail for question about the dataset and paper


This is the following publications use our dataset. Please contact us if you are using our dataset and we will add your paper to the list.