Research

HDSRNet: A Simple Segmentation-Free Method For Unconstrained Handwritten Digit String Recognition

 2024.9.30.

Handwritten digit string recognition (HDSR) is one of the most challenging tasks in the area of off-line handwritten optical character recognition. The main challenge comes from the segmentation errors caused by several factors such as touching, breaking, complex background, and unknown length of string. As a way to avoid these problems, we propose a new segmentation-free HDSR system using a one-stage object detector based on a convolutional neural network (CNN). The experiments conducted on the benchmark datasets (CVL, ORAND-CAR) show that the proposed method achieves excellent performance compared to other state-of-the-art methods.

In several kinds of form document, handwritten digit strings appear as the amount on bank checks, postal code, date, digital values etc. In the HDS image, the digit can be regarded as an object and the segmentation as a location of the digit region, therefore, the recognition problem of the digit string can be solved by the object detection methods. From this point of view, for HDSR we can use CNN-based object detection networks widely used in the field of visual recognition.

Our proposed HDRS system consists of a basic network that performs feature extraction and a detection network that performs digit recognition and location.

Architecture of HDSRNet
Figure. Architecture of HDSRNet

The basic net for feature extraction is a fully convolutional network that has several convolutional layers. The input of the first layer is a gray-scale HDS image and the output of the end layer is the final feature map that contains the information for predicting boxes. The detection net predicts the bounding box of digits and yields the final recognition results. Let W and H be the width and height of the final feature map. When the input image is divided into W×H grid, each point on the final feature map is laid on the center of the corresponding grid cell. The set of predictions on i-th grid cell is defined as follows.

P(i)={x,y,w,h,c,c1,…,ck}(i), i∈[1,W•H](1)

For the set of predictions, center coordinatesx, y are the offsets from top left corner of the grid cell, w, h are the box width and height, c is the object confidence and c1,…,ck are the class-specific confidences. Therefore, the number of predictions is 15 if K=10 (digit class contains 0~9). The object confidence c represents the probability of that the grid cell contains one of any digit objects, that is, the center of any digit box falls within the grid cell. If c is more than a confidence threshold, we get one predict box B={bx,by,bw,bh,k,ck}.

In i-th grid cell, the real coordinates of the predicted box are computed as:

bx=((I+σ(x))/W)•Iw, by=((J+σ(y))/H)•Ih, bw=wwaIh, bh=hhaIh (2)

where (Iw, Ih), (wa, ha) are width and height of input image and anchor box, respectively and σ(x) is logistic sigmoid function. I, J are the position pixel coordinates in the input image of the ith grid cell point.

After taking boxes with high confidence by scanning each grid cell, non-maximal suppression (NMS) is used to remove overlapped boxes and then the remained boxes are sorted in ascending order of x-axis coordinate.

To evaluate the effectiveness of the proposed method, we conducted comprehensive experiments on three public benchmarks (CVL, ORAND-CAR-A, ORAND-CAR-B) and a long-length synthetic dataset (LENGTH-HDS) which is generated by concating HDS images in these benchmarks.

For the accuracy evaluation, it is considered a HDS image to be recognized correctly only if all individual digits in the HDS image were correctly recognized.

Compared with the ResNet-RNN-CTC method, which performs the best among the previous HDSR methods, the proposed method outperforms the ORAND-CAR-A 7%, ORAND-CAR-B 5%, CVL 5% and LENGTH-HDS 92.7%.

Our result was published in the "2023 4th International Conference on Electronic Communication and Artificial Intelligence"under the title of "HDSRNet: A Simple Segmentation-free Method for Unconstrained Handwritten Digit String Recognition"(https://doi.org/10.1109/ICAI58670.2023.10176769).