Detection of Monocrystalline Silicon Wafer Defects Using Deep Transfer Learning

|Defect detection is an important step in industrial production of monocrystalline silicon. Through the study of deep learning, this work proposes a framework for classifying monocrystalline silicon wafer defects using deep transfer learning (DTL). An existing pre-trained deep learning model was used as the starting point for building a new model. We studied the use of DTL and the potential adaptation of Mo-bileNetV2 that was pre-trained using ImageNet for extracting monocrystalline silicon wafer defect features. This has led to speeding up the training process and to improving performance of the DTL-MobileNetV2 model in detecting and classifying six types of monocrystalline silicon wafer defects (crack, double contrast, hole, microcrack, saw-mark and stain). The process of training the DTL-MobileNetV2 model was optimized by relying on the dense block layer and global average pooling (GAP) method which had accelerated the convergence rate and improved generalization of the classi(cid:12)cation network. The monocrystalline silicon wafer defect classi(cid:12)cation technique relying on the DTL-MobileNetV2 model achieved the accuracy rate of 98.99% when evaluated against the testing set. This shows that DTL is an e(cid:11)ective way of detecting different types of defects in monocrystalline silicon wafers, thus being suitable for minimizing misclassi(cid:12)cation and maximizing the overall production capacities.


Introduction
Detecting silicon wafer defects is one of the challenges faced by silicon wafer manufacturers.Currently, silicon wafer inspections are performed manually by relying on visual inspection (VI) or using an automated optical inspection (AOI) process.VI involves an analysis of the products on the production line.Inspectors must visually identify any defects on the wafer surface, either using their naked eyes or under a microscope, before the finished goods are transferred for packing.Manual inspections involve a contact-based verification of the wafer surface.It is characterized by a low degree of automation and high labor intensity, as the elements need to be handled by humans.Such an approach is labor intensive, inefficient and means that the process of detecting defects is inaccurate.It may also lead to the application of various standards due to objective human judgments, thus failing to meet the strict requirements of modern industry.On top of that, early detection of defects is important, as production may be halted to address the root cause of the defect, and manufacturers may mitigate their potential economic losses (time and cost) incurred in connection with withdrawing defective wafers from circulation.Monocrystalline silicon is commonly used for photovoltaic (PV) devices.To produce a high-quality solar panel, silicon wafers must be clean and free from any impurities.However, various types of defects may occur, such as scratches, chips and cracks.Other visual defects may also be present on the surface of solar cells due to uncontrollable factors encountered during the production phase.Many types of silicon wafer defects exist that may be detected on the wafer surface.For the purpose of our study, we obtained digital images of monocrystalline silicon wafer defects from LONGi's production facility based in Kuching, Sarawak, Malaysia.AOI is a key technique used in manufacturing to ensure the quality of printed circuit boards assemblies (PCBA) used in electronics.By detecting incorrect, missing, and incorrectly placed component, it is a swift and accurate inspection tool ensuring that the PCBs leaving the production line are detect-free.As such, the technology is capable of replacing human inspectors, as it is faster with offers higher accuracy rates.The AOI-based silicon wafer defect recognition process is divided into three phases, i.e. image processing, pattern recognition and classification.Image processing is used to enhance the images and extract specific, useful features.Pattern recognition, meanwhile uses statistical information or machine learning techniques to classify the features into distinct categories based on their shape, color and texture.Lastly, the classification stage allows to assign the silicon wafer defect patterns recognized to specific types.AOI is

Efficiency
The average time, more than 5 minutes need to be spent on inspecting 3,000 wafer elements, and the defects cannot be identified accurately AOI may test the same amount in 10 seconds, and each machine is capable of handling 1.5 production lines

Stability
Humans suffer from fatigue and emotions, meaning that they may work while being focused for approximately 3 hours AOI uses visual simulations, offering extremely high stability, and is capable of maintaining the same standard operating continuously over periods of time that exceed 24 hours capable of efficiently handling the detection of particular defects.However, it continues to suffer from some misclassification issues due to the fact that the visual appearance of silicon wafer defects may be similar.Table 1 is based on LONGi's production line experience and presents a comparison between manual visual inspection and AOI, in terms of cost, efficiency, and stability [1].Since deep learning (DL) requires a lot of data, transfer learning is the best way to address the DL requirement.DTL is an approach adopting the model parameters learned from a well-known deep learning architecture that has proven to be effective in learning new data on Ima-geNet.In this paper, we present our findings on transfer learning between an ImageNet dataset and the wafer defect dataset, using the MobileNetV2 model to classify six defect types shown in Fig. 1.The double contrast defect may occur in two situations: it may be caused by an abnormal stoppage of the line and resumption of its work with different a quality level of the wire and a different process recipe.It may also be caused by slicing the wire due to a sudden change caused by an abnormal stoppage resulting from previous cuts.Saw-marks, also known as sawlines, are caused by an abnormal stoppage occurring during slicing, and by resuming with a wire of different quality or with a different process recipe.This The motivation behind this study was to address the current shortcomings of the AOI visual inspection method used to identify the aforementioned defects.We adopted the deep transfer learning (DTL) approach by using the MobileNetV2 architecture [2], [3] to detect and classify silicon wafer defects.Supervised learning was used, as we were using labeled data.This approach is simpler and more accurate compared to unsupervised learning.This paper expands the current knowledge on wafer classification, relying on a DLT approach that differs from that relied upon by Mat Jizat et al. [4] for six types of monocrystalline silicon wafer defects.The methodology was developed based on our objective to classify monocrystalline silicon wafer defects into six different categories following a single AOI pass performed during the quality control process on the production line.The monocrystalline silicon wafer defects were re-run through the AOI to check for false reject.Healthy silicon wafers are not covered by the scope of this study, as they are not identified in the course of the AOI inspection.
The rest of the paper is outlined as follows.The related work is discussed in Section 2. Section 3 describes, in detail, the DTL approach, network architecture, and the process of building the model.Experimental results are analyzed in Section 4. Section 5 contains the conclusion and presents the future work to be performed.

Related Work
Deep learning is a branch of the machine learning domain in which DL algorithms are less dependent on human intervention to learn a hierarchy of features from input data [5].
DL has been widely used for image classification [6].The essence of DL is to learn relevant features by building learning models with multiple hidden layers.In order to enhance the accuracy of classification, vast amounts of training data are required in such an approach.Deep transfer learning (DTL) in an adaptation of the transfer learning approach, where the knowledge from existing DL models is stored and transferred to another model to solve a different, but related problem.DTL is gaining popularity as the amount of time required to develop the model and collect the data is reduced drastically.Hence, less effort is needed for updating DTL models, as once it has been trained with a sufficient amount data, it may be updated with a small quantity of additional data in a very short time, without compromising its accuracy [5].Thus, DTL is suitable for solving problems involving small amounts of data.
Mat Jizat et al. [4] evaluated four machine classifiers for wafer defect detection on a small dataset with less than 1000 images, using InceptionV3 Transfer Learning.According to their reports, logistic regression and stochastic gradient descent (SGD) exhibit better classification accuracy in the range of 85-88%, in comparison with the two remaining classifiers.The use of more image samples and further optimization aiming to increase the accuracy rate are predicted, but no details on the training parameters are provided to support this suggestion.Imoto et al. [7] compared DL with the existing automatic defect classification (ADC) approach used for detecting defects in semiconductor manufacturing.The comparison showed a significantly higher detection accuracy of DL compared to ADC.Kudo et al. [8] applied transfer learning in CNN to solve the issue of dislocation clusters, i.e. crystallographic defects in a photoluminescence (PL) image of multi-crystalline silicon wafers.Transfer learning of a convolutional neural network (CNN) was applied to solve the issue.The input image is a sub-region image of a whole PL image, and the CNN evaluates whether the dislocation cluster regions are detected in the upper wafer image, by using the images of dislocation clusters in lower wafers as positive examples.The experiment was carried out under three conditions: negative examples using images of some depth wafer, randomly selected images, and images from both types of condition.Then, the accuracy and Youden's J statistics were used to evaluate two cases, which are predictions of occurrences of dislocation clusters at ten or twenty upper wafers.Results from the investigation show that accuracy and Youden's J were better than the "bag of features" approach in predicting the dislocation cluster regions.Lee and Lee [9] evaluated the use of DTL for learning new defect patterns in wafer images.They found DTL may be used as effectively as a fully trained DL model (with the accuracy difference equaling 2%).The advantage of DTL is that it learns the defects faster.It also enabled the researchers to obtain a reliable model by updating the model as needed.

Classification of Silicon Wafer Defects using DTL
The conceptual framework for classifying silicon wafer defects is presented in Fig. 2. Firstly, wafer defect images are collected with manually inspected ground truth data by industrial experts.Then, a suitable DTL architecture is identified based on the available models in the Top-1 results (based on GPU architecture) which is adopted to classify monocrystalline silicon wafer defects [10].The next step consists in building the input pipeline tasked with reading all the collected sample images from the wafer defect data sets.After that, experiments are conducted to build the DTL-MobileNetV2 model by adding additional layers that learn the six different defect classes using the training set.The model built was saved and tested by performing a prediction with the use of the testing set.To improve the performance of the model obtained further training sessions were performed by unfreezing the parameters in the network layer.

Data Collection
The sample images of the defective wafers were randomly selected from the existing database compiled by a production expert from LONGi.These images were automatically collected by running the defective wafers through an AOI machine.Manual classifications performed by industrial experts are collected as ground truth data.Examples of images in the dataset may be seen in Fig. 1.The entire dataset contains 6,000 images, with each defect class being shown in 1,000 pictures.For each defect class, the images are randomly divided into three subsets, i.e. training (70%), validation (15%), and testing (15%) sets.The resolution of each image is 224 × 224 × 3. The training set is used to build the DTL models, while the validation set is used to provide an unbiased evaluation of the built models and to refine their parameters.The testing set is used to ensure an equitable evaluation of the best model built based on the training set.

DTL Architecture
The DTL architecture used in this work was based on MobileNetV2 [2].We used the pre-trained version of MobileNetV2 trained using the ImageNet dataset which contains 14 million images of 22 thousand visual categories [11].Based on the learned feature maps of the Mo-bileNetV2 pre-trained DL algorithm, significant features were extracted from silicon wafer defect images.The advantage of using the DTL architecture consists in the fact that we do not need to use random initialization for building a new deep learning model.Instead, the model for classifying the wafer defects shares the same initialization parameters that were identified as effective during the learning process, using the ImageNet dataset.To achieve

Experiment Design
The algorithm learned to recognize defects in silicon wafers by analyzing images of defective silicon wafers.This was done by feeding the images from the training set into the DTL architecture.The experiment was designed to assess the accuracy of using MobileNetV2 as a DL model to classify silicon wafers into correct defect categories.In addition, we also evaluated the effects of freezing and unfreezing the DL model's parameters during the training process.
The training parameters used to conduct the experiments are listed in Table 2.We refine the training process by unfreezing the trained network layers using the step learning strategy.Initially, the learning rate equals 0.001 and decreases by a factor of 0.5 every five epochs.During the training process, the regularized DropBlock method was used to reduce parameter calculations in the fully connected layer.This allows to avoid network over-fitting and boosts the accuracy of the process of classifying silicon defects.The silicon wafer defect features obtained during the training phase with the use of MobileNetV2 were fed into a new Softmax layer to obtain the output probability for each silicon wafer image and its respective defect class.The model built was then used to classify the images from the testing set.

Analysis and Discussion of Results
We evaluated the effectiveness of the DTL model using the standard evaluation measures, namely accuracy and loss.Accuracy is the percentage of correct predictions made by the model.It is calculated by dividing the sum of correct predictions (true positive + true negative) by the total number of predictions (true positive + true negative + false positive + false negative).The model loss reflects the error rate between two probability distributions of the true defect class and the predicted defect class.It is measured using cross-entropy, where a higher value indicates that the predicted defect class diverges from the true defect class.

DTL Training Accuracy
Due to the stochastic nature of DL, the models can be built several times using the same set of training data to assess the accuracy and the compare each model against its counterparts.No standard number of models that need to be built has been defined.Most research relied on one model only, due to time-and resource-related limitations encountered while building the models [12].In our experiment, we chose to build three models and to evaluate their mean and standard deviation [13].The accuracy of the DTL-MobileNetV2 model (trained using the training set and validated using the validation set) for three iterations is shown in Table 3.

Batch Size (32 vs. 64)
The impact of different batch sizes on the accuracy of the model was investigated as well.Masters and Luschi [14] reported that best performance in their experiments was achieved with batch sizes of 16, 32 and 64 for an ImageNet dataset.In this paper, models of batch sizes 32 and 64 were trained and results obtained before and after their fine tuning were combined to conduct a more comprehensive comparison analysis.The batch size of 16 was excluded from the experiment as it is computationally and time intensive.We added two additional layers on top of the existing MobileNetV2 that will classify wafer defects into the six respective classes.
In order to compare the effectiveness of using DTL without fine tuning the model (all network layers frozen), we conducted the training phase for 100 epochs.We further executed another 100 epochs by unfreezing the network layers to optimize the model for monocrystalline silicon wafer defects.This process updates the weights of ImageNet feature maps to features specifically associated with the dataset.The detailed results from before and after fine tuning for the batch sizes of 32 and 64 are shown in Table 4. From the experiment using batch size 32, the built model's training accuracy equals 91.32%, and validation accuracy amount to 91.78% before fine tuning.When the model was fine tuned, the training accuracy improved to 97.68%, and validation accuracy was 97.2%. Figure 3 shows the accuracy and loss versus the epoch during training for the two different batch sizes.
Using the batch size of 64, the built model achieved a training accuracy of 91.52% and a validation accuracy of 91.80% before being fine tuned.After the model was fine tuned, the result improved to 98.05% in terms of training accuracy and 97.67% in terms of validation accuracy.
The results of the experiments have shown that the larger the batch, the faster the deep learning algorithm converges, and the shorter the time required to achieve better training accuracy.On top of that, from the visual and table results, it is safe to say that with the increase of the batch size, the accuracy of the model prediction will be improved to a certain extent, and the mean accuracy and mean loss are better than outcomes specified in the literature.
Upon performing the optimization, the DTL model learned more features and classified the wafer defects better.This can be clearly seen on the right side of the graph in Fig. 3.The improvement stemmed from the fact that the optimized model was allowed to learn new features, update its weights and biases from the data that was different from the ImageNet data.Although there are two declining spikes, the accuracy was still better than the frozen results.The two declining spikes may be caused by the sample data in that validation batch which is harder to classify by the DTL model.Overall, the findings prove that unfreezing the network layers during the model training phase produces a better result than when the layers are frozen.We speculate that the same model parameters could be adopted when using other DL algorithms, such as DenseNet201.

Classification Accuracy
To evaluate the DTL MobileNetV2 model built for classifying silicon wafer defects, a classification prediction was performed with new sample images of wafer defects from the testing set.The model evaluation accuracy against the testing set is at 98.99%.As we do not compare different DTL architectures in this work, we have not performed a significant test for the results.The visualization of the  fect (with the accuracy rate of 95%).More training and customization of the DL is needed to recognize the crack defect, as the classification accuracy for this class was the lowest.

Conclusion
In this work, we present a solution allowing to classify monocrystalline silicon wafer defects using deep transfer learning.The experiments were carried out using a realworld dataset from a production plant.We found that DTL MobileNetV2 was capable of accurately identifying certain defects with limited training sample size.It was also determined that batch influenced as well, both in terms of accuracy and We also showed that batch size of offered accuracy (before and after fine tuning) than a smaller batch of 32.Despite the success in training the model and classifying wafer several limitations have been encountered.Firstly, the dataset could be improved by adding perfect wafers that had passed AOI defect inspection.Secondly, the methodology relied upon in this work was not designed for real-time classification.Prospects for future work include collection of perfect and defective wafer samples from diversified production facilities to expand the dataset, further customization of the DTL MobileNetV2 solution using other new deep learning models, and integration with a real time mobile terminal.

Fig. 1 .
Fig. 1.Sample images of monocrystalline silicon defect types (see the digital version for color images).

Fig. 4 .
Fig. 4. Confusion matrix using (a) the exact number of predicted images and (b) accuracy percentage.

Fig. 5 .
Fig. 5. Sample of defect classification prediction results of 25 images from the testing set.

Table 1
The comparison between manual visual inspection (VI) and automated optical inspection (AOI)

Table 4
Accuracy and loss of the DTL MobileNetV2 model training and validation for batch sizes of 32 and 64