A cool paper from Facebook AI (not from FAIR!) about detecting and reading text in images, at scale.
This is very useful for detecting inappropriate content on Facebook.
The system uses R-CNN/Detectron for detecting lines of text.
The OCR uses a ConvNet applied at the level of a whole line trained with CTC.
This concept of applying a ConvNet on a whole line of text, without prior segmentation, has roots in the early days of ConvNets, for example with this NIPS 1992 paper:
"Multi-Digit Recognition Using a Space Displacement Neural Network"
by Ofer Matan, Chris Burges, Yann LeCun and John Denker.
Link: https://papers.nips.cc/paper/557-multi-digit-recognition-using-a-space-displacement-neural-network
Youtuve video with short explanation: https://youtu.be/yl3P2tYewVg
#ocr #cv #dl #rnn #facebook #yannlecun #video
This is very useful for detecting inappropriate content on Facebook.
The system uses R-CNN/Detectron for detecting lines of text.
The OCR uses a ConvNet applied at the level of a whole line trained with CTC.
This concept of applying a ConvNet on a whole line of text, without prior segmentation, has roots in the early days of ConvNets, for example with this NIPS 1992 paper:
"Multi-Digit Recognition Using a Space Displacement Neural Network"
by Ofer Matan, Chris Burges, Yann LeCun and John Denker.
Link: https://papers.nips.cc/paper/557-multi-digit-recognition-using-a-space-displacement-neural-network
Youtuve video with short explanation: https://youtu.be/yl3P2tYewVg
#ocr #cv #dl #rnn #facebook #yannlecun #video
papers.nips.cc
Multi-Digit Recognition Using a Space Displacement Neural Network
Electronic Proceedings of Neural Information Processing Systems
​​Scene Text Recognition via Transformer
The authors propose a simple but extremely effective scene text recognition method based on the transformer. The proposed method uses convolutional feature maps as word embedding input into the transformer. In such a way, their method is able to make full use of the powerful attention mechanism of the transformer.
Extensive experimental results show that the proposed method significantly outperforms SOTA methods by a very large margin on both regular and irregular text datasets. In particular, the proposed method performs the best on two regular text benchmarks. On irregular text benchmarks, the proposed method shows its powerful ability to recognize irregular texts. Surprisingly, the proposed method outperforms the second best by very large margins, 14.5%, 11.8%, and 9.7%, on the IC15, SVTP, and CUTE, respectively.
paper: https://arxiv.org/abs/2003.08077
github: https://github.com/fengxinjie/Transformer-OCR
#ocr #scene #text #recognition #cv #nlp #resNet #Transformer
The authors propose a simple but extremely effective scene text recognition method based on the transformer. The proposed method uses convolutional feature maps as word embedding input into the transformer. In such a way, their method is able to make full use of the powerful attention mechanism of the transformer.
Extensive experimental results show that the proposed method significantly outperforms SOTA methods by a very large margin on both regular and irregular text datasets. In particular, the proposed method performs the best on two regular text benchmarks. On irregular text benchmarks, the proposed method shows its powerful ability to recognize irregular texts. Surprisingly, the proposed method outperforms the second best by very large margins, 14.5%, 11.8%, and 9.7%, on the IC15, SVTP, and CUTE, respectively.
paper: https://arxiv.org/abs/2003.08077
github: https://github.com/fengxinjie/Transformer-OCR
#ocr #scene #text #recognition #cv #nlp #resNet #Transformer