DCNN-Based Screw Detection for Automated Disassembly Processes

Automation of disassembly processes in electronic waste recycling is progressing but hindered by the lack of automated procedures for screw detection and removal. Here we specifically address the detection problem and implement a universal, generalizable, and extendable screw detector which can be deployed in automated disassembly lines. We selected the best performing state-of-the-art classifiers and compared their performance to that of our architecture, which combines a Hough transform with a novel integrated model of two deep convolutional neural networks for screw detection. We show that our method outperforms currently existing methods, while maintaining the high speed of computation. Data set and code of this study are made public.


I. INTRODUCTION
Disassembly of electronic devices is a multi-million dollar industry, because of the short life cycle of the current products, which leads to massive amounts of valuable raw material when recycling. The (questionable) trend to increase replacement frequency of electronic devices, such as computers, storage devices, PCBs, etc., this industry is set to grow even further. Electronic products are, thus, usually discarded before their materials degrade. These complex End-of-Life (EOL) products contain a broad spectrum of materials including valuable metals such as silver and copper as well as rare earth metals. When considering only Japan, in their recycling plants currently there are an estimated 300,000 tons of rare earths stored in unused electronics [1]. In France, the Rhodia group is setting up two factories, in La Rochelle and Saint-Fons, that will produce 200 tons of rare earths a year from used fluorescent lamps, magnets and batteries [2]. Therefore, recovery of materials in electronic EOL products is a highly viably business.
Economical reasons are definitely not the only ones for companies and governments investing in increasing the efficiency of recycling processes. Studies like [3] show that recycling processes are either manual or when fully automated the most common way to recycle materials follows the "crush and separate" paradigm, where the to-be-recycled product is first ground down to small pieces from which the raw materials are then extracted using physico-chemical methods. However, some electronic devices, which arrive to recycling plants, such as GSM amplifier boxes, contain hazardous elements Fig. 1. We present a universal, extendable, RGB based screw detection scheme which is engineered for automated disassembly tasks. Our scheme uses Hough Transform and an integrated model of two deep neural networks. Blue squares indicate possible candidates, while green circles represent the final prediction of screws.
(like Beryllium in the case of GSM amps.). Elements like these represent a massive danger to health and environment. Therefore "crush and separate" paradigm is not possible in such cases and most of the times human workers have to (pre-)disassemble those with a remaining danger of accidents even in spite of best protective actions. This problem is amplified in countries with less strong health and safety standards. This shows that also health&safety as well as environmental protection render strong incentives for improved automation in recycling.
Due to many electronic device variants, there is an urgent need to find a generic solution for automated disassembly, and primarily a generic solution to detect the most essential part of any device when it comes to disassembly: screws. In this paper, a visual screw detection scheme based on the combination of deep learning methods empowered with classical computer vision methods is proposed. The proposed scheme uses the well known Hough transform to generate screw-candidates, which are then filtered by our integrated model, which combines two deep neural networks. The scheme can account for any type of screw as long as the user collects enough data for training. Contrary to state-of-the-art techniques, which require users to find specific datasets online, we let users to create their own dataset, given the device. This eliminates the trouble of finding datasets for specific screw types, and ensures high accuracy for our network. Thus, one main approach of this paper focuses on fusion of features acquired from classical computer vision methods and the deep learning model to detect the screws with the dataset, which is collected by the user in a semi-automated way.
We examine the performance of our proposed scheme in different situations and the extent of generalization for effective automation and robotics usage in disassembly. For our experiments, we have collected in total 10000 images of screws as positive samples as well as non-screw artifacts (i.e., holes, stickers, PCB parts, etc.) as negative samples. Data and code are currently being released to facilitate future research.

II. RELATED WORK
Automated disassembly has been researched for a while now [4]- [6] and there are some schemes [7]- [11] for automating certain processes. However, none of the proposed schemes is actually offering a generalizable, extensible, and universal solution of the screw detection problem. Various algorithms have been proposed for screw detection as a part of automated disassembly strategies. Most of these were either too much model-dependent; meaning screw-specific (i.e. only Torx8) or device-specific (i.e., only electric motor screws) [9], [12] or they were extremely brittle due to the fact that the methods they used were highly dependent on classical computer vision methods, which are easily affected by a slight change of illumination [13]. Also, unlike methods like [9], we do not require a depth sensor (i.e., RGB-D camera) to conduct the detection.
Another interesting attempt was conducted by [13]. The authors tried to perform screw detection using template matching on metal ceiling structures for dismantling and successful reuse of light steel gauges. In this scheme, a hierarchical vision system detects the light steel gauge first and then uses multiple template matching to detect screws. This very method also has a very obvious shortcoming: the method depends on a fixed template and therefore it cannot generalize. Also, back then, a light steel gauge had only one type of screw, but there is no guarantee that it will stay like this in the future. Changing the template is tedious and non-desirable and, thus, this method is highly specific and it also cannot address other metal structures or E-Waste devices.
One work, which took our attention, focuses on the disassembly of the electric vehicle batteries using a robot system [6]. Their main goal was to detect M5 bolts on the battery joints. They used a Haar-type cascade classifier, which is trained on cropped images of M5 bolts. Then, to improve the performance of classification, false positives detected from the classifier were added to the negative set. Although the approach sounds quite feasible, unfortunately Haar cascades are not performing very good when it comes to classification.
They were able to achieve only 50% detection accuracy, which makes the method impractical for industrial use.
Another work we would like to mention focused on autonomous disassembly of electric vehicle motors [9]. The authors tried to detect screws found on electric vehicle motors using an RGB-D sensor (Kinect) [14]. Although the proposed algorithm is scale, rotation, and translation invariant, it heavily relies on traditional computer vision methods such as Harris corner detection and HSV image analysis, which are easily affected by the lighting conditions. Another shortcoming is the fact that they require a depth image from the RGB-D sensor to remove false positives such as holes, which adds computational load.
Thus, it seems that there is still a substantial lack in generalizable, device and screw-independent methods, which can be used in disassembly processes.

III. METHOD
In this section we explain each block found in our pipeline. However, before doing that, we would like to inform the reader about the setup our scheme requires. We propose a setup in which the camera faces the device's surface perpendicularly. The distance between the device and the camera was 60 cm, however, depending on the size of the device, this distance may change. Since we worked with computer hard drives, 60 cm was a suitable height.
Our scheme has two modes: offline and online. In the offline mode, the aim is to collect positive and negative samples for the training of the deep neural networks that we use. Therefore, in the offline mode, we are saving possible candidates, which could be screws or artifacts cropped from the camera image. These images are then to be divided by a human into positive and negative samples (screw/non-screw) for the training session. Fig. 2 illustrates the offline mode on the right side.
Having collected the training data and trained the network, the second mode of our scheme is ready to use. When it comes to inferring of screws, as shown in Fig. 2, we again perform the same initial function blocks, however we differ later on and use the trained model of our network to differentiate between positive and negative candidates relying the the trained network. Our scheme then marks and returns the locations of the screws seen in the image.
1) Preprocessing: Computer vision is first used to crop the image to only the region where the device is visible. Cropping is done in a parameter dependent way, so that -depending on the device -users can crop the incoming image as required. Following this, we convert the RGB image into a Grayscale image.
2) Candidate Generation: We have analyzed the different types of screws found in the domain of E-Waste, to make sure that our method will cover all conventionally used screws found in this domain. For this we assessed various electronic devices, which can be found in myriad numbers nowadays in E-Waste, such as computer hard drives, DVD players, gaming consoles and many more. As expected, we found that almost all screws in this domain are circular, which is the natural geometry of a screw and represents the central feature to be used to detect a screw object. Fig. 3 illustrates samples of screws found in E-Waste. There are also noncircular screws manufactured but those are few and we found no such screws in the devices mentioned above. We therefore based our method on first finding circular structures in the images. Obviously, not every circular structure is a screw, for example stickers, holes, transistors, etc. exist, which are also circular, but not screws. Still, circular structures provide us with priors for screws and the first step of our method is to collect those screw-candidates.
As mentioned above, in order to collect candidates, we run our program in offline mode and rely on the Hough Transform for candidate detection. This is a standard computer vision method for circle detection [15] and shall not be explained here. Different from the standard Hough Transform here we use a version, which relies on the so called Hough Gradient (of the OpenCV library [16]), which uses the gradient information of the edges that form the circle. We refer the reader to the handbook published by the creators of the aforementioned library for further implementation details on the algorithm of the Hough Gradient. We have investigated also state-of-the-art classifiers found in the literature and we picked the six top-performing ones for comparison at the end. These networks, to our experience, were performing tolerably good given a not so large dataset for a specific device-class (hard drives of any size) class. In Fig. 4 one can see two rows depicting artifacts and screws, respectively, taken from the hard drives. In general, however, these types of positive and negative training samples are observed also in other device-classes and, thus, the resulting training set can be transferred also to other devices. In that case, however, one has to increase the number of samples. There are several classifiers in the literature, which we could investigate on our task which are Xception [17], InceptionV3 [18], ResneXt101 [19], InceptionResnetV2 [20], Densenet201 [21], Resnet101v2 [22]. These networks achieve over 93% top-5 accuracy on the well-known Imagenet dataset [23]. In order to further improve learning and to reduce overfitting, we inserted a dropout layer before the last fully connected layer of each network.
To further reduce overfitting and to come up with a model that can generalize, we applied an additional data augmentation step. There are several data augmentation operations we applied to introduce more variety in the data. The most important ones are normalizing the image data into a range of [0,1] and randomly setting the brightness in the range [0.5,1.5].
The experimental evaluation each of these network on the test data (see next) allowed us to select two to be used in our processing pipeline.

IV. EXPERIMENTAL EVALUATION
We conducted several experiments on the test data we collected. Out of the top six state-of-the-art classifiers, we picked the best performing two and combined them as illustrated in Fig. 5. Below we also provide the details of the experimental evaluation and present our justification for our decision of combining two.

1) Experimental Environment:
For the evaluation of the screw classifiers, we collected a dataset consisting of over 10000 samples and split it into training and test sets. The training set includes 1491 screws and 4924 artifacts. The test set includes 1000 screws and 3285 artifacts. We use a computer with Intel Core i7-4770 CPU @ 3.40GHz, 16GB of RAM with GeForce GTX Titan X graphic card to train the classifiers. For evaluation of the performance of our screw detector system, we collected approximately 300 hard drive images containing over 1500 screws. We split those images into training and test sets with ratio of 2:1. To demonstrate the efficiency of our screw detection pipeline, we choose state-ofthe-art object detection -YOLOv3 [24], re-train on our screw detection training set and compare results with ours.
2) Experimental Metrics: We use standard metrics for classification evaluation. We are interested in the accuracy of the networks we picked for our pipeline. Therefore we calculate the accuracy of each of them as follows: where TP stands for True Positives, TN for True Negatives, FP for False Positives and FN for False Negatives. To evaluate screw detection performance, we use Average Precision (AP) which is calculated based on the precision-recall curve. For details the reader may refer to [25].
3) Experimental Results: We summarize the experimental results with regards to performance of each classifier against the testing set in Table I. From the collected results in Table 1, one can conclude the following: All of the investigated models achieve very high accuracy -over 96% on the testing data, with the model InceptionV3 scoring the highest accuracy of 98.8% among single models. We also notice that the InceptionResnetV2 model scores a very high accuracy of 98.6%, however, note that this model is much heavier than the Xception model which scores comparable accuracy of 98.5%. We then decided to combine two best performing models to boost the accuracy of our classifier. At this point we choose the models InceptionV3 and Xception to build an integrated model for final prediction. Since InceptionV3 performs slightly better than Xception, we prefer to use slightly higher weights on the results of the InceptionV3 model. The final confidence score we use to evaluate the integrated model is presented below: Since our scheme uses an integrated model, one has to adjust the value of the confidence level carefully. We have done our experiments to find out the best threshold value for the confidence level, and we chose the threshold for our model by sliding the threshold value over the range from 0.5 to 1.5 with steps of 0.01. Experiments showed that 0.8 is the best threshold and thus is chosen for our integrated model.
We then employ the integrated model in our screw detection pipeline and evaluate it on our test dataset. Fig. 6 shows the precision-recall curve of our screw detector. Our pipeline achieves an AP of 80.23 which clearly outperforms the well known detector YOLOv3 with an AP of 66.47. Fig. 7 illustrates some samples of the detections by our detector.

DISCUSSION AND CONCLUSIONS
In this study we tackled the fundamental problem of screw detection in disassembly environments. The problem itself is a challenging one, since screws have variable shapes and appearance and not every electronic device has the same type of screws. This is the reason why previously developed methods were not useful as a general solution to this problem. We proposed a model, which is based on the Hough transform and deep neural networks. Our scheme easily lets the user use the system for any device of his/her choice, as long as the user separates the collected data into screws and artifacts himself. After doing this and training the network, we could demonstrate that our system achieves real-time performance with quite high accuracy. This had been quantified with hard drive devices of different models and sizes, which have different sizes and types of screws as documented by the experimental evaluation results of our scheme. The data set as well as the ROS-based implementation are published to facilitate further research 1