Jump to content

How to transcribe handwriting with custom dataset in Python

TechVeera

I know this isn't the correct forum, but I've looked everywhere. How can I have Python recognize handwriting with a custom dataset? I want to use it for script hebrew, and I can't find any datasets, of instructions on how to make your own dataset and use it.

 

Link to comment
Share on other sites

Link to post
Share on other sites

This is both a broad and deep concept, but I'll do my best to hit the highlights. First though, here is an example using whole Hebrew words.

 

Recognizing Handwriting
There are a lot of examples for this, but probably not in Hebrew. You'll want to look at libraries such as OpenCV and pyTesseract to help you separate the letters of each word then you will want to use some classification algorithm to determine what character this is. Search for examples of python OCR (Optical Character Recognition) to see some examples and different methods.

 

Custom Dataset

To make a dataset like this you'll probably want A LOT of images for it to work with any meaningful accuracy to capture variation in the character. You will want images of the individual characters with labels (typically the file name) to train the classification algorithm. So a picture of the letter 'A' for example may be A-1.png. You will load in the images into an array and store a trimmed down file name ('A-1' becomes 'A') with the hebrew character label - UTF-8 encoding will likely be necessary. The classic example of this is the MNIST dataset, which is handwritten numbers 0 - 9. Then of course you will train a classification algorithm on your custom dataset and then use it to predict letters from your use case. Look at some of the many examples using the MNIST dataset and that hopefully will set you on the right path.

 

I did see websites that you can upload images of Hebrew to and it will perform OCR on, but it isn't as fun as a DIY solution.

Here's an old stackoverflow post that seems to be describing some challenges they ran into with what you're trying to do. Good luck!

Edited by MaterialWolf
Found github example
Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×