Languages Guide for leading OCR products

When you scan a document that has text or numeric data on it, you are able to read and understand what is written in the scanned image. However, to a computer, the resulting image file is just as meaningless an assortment of pixels as a landscape photo. In order to transform this information into an editable format that you can search through, copy, and modify without retyping it manually, you will need the an Optical Character Recognition (OCR) software.

There is a wide variety of OCR software available. While they all share the ability to convert images of machine printed (not handwritten) text or numbers into an editable format, the various software often have different features, accuracy, prices, and language options.

Our OCR Software Guide and Comparison Chart explain the differences between the assortment of software available, as well as offer our recommendation for the best overall software when it comes to converting English documents. However, there is also a difference in the number and selection of languages that the various software can convert. Below, you will find a list of languages that our top three choices in Desktop OCR software are able to convert, with the languages that have dictionary support marked in italics.

Some language groups are more recent additions to the OCR scene. Among these are Arabic scripts, including Hebrew, and Asian characters, such as Chinese. While not all software support them out of the box, they are slowly being integrated, first as add-ons to the base software and eventually as part of the default language selection.

SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your basic version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.

All SimpleSoftware products have Tesseract 5 OCR languages support. You can learn more about it and download additional language libraries here. And you can check and add more OCR languages libraries supported with Tesseract on your station here:

C:\Program Files (x86)\SimpleIndex\Tesseract\v5.3.0\

SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:

C:\Program Files (x86)\SimpleIndex\OCRLanguages.txt

OCR Guide

Brands

Compare

Languages

Applications

Abkhaz
Adyghe
Afrikaans
Agul
Albanian
Altaic
Arabic (Saudi Arabia)
Armenian (Eastern)
Armenian (Grabar)
Armenian (Western)
Avar
Aymara
Azeri (Cyrillic)
Azeri (Latin)
Bashkir
Basic
Basque
Belarusian
Bemba
Blackfoot
Breton
Bugotu
Bulgarian
Buryat
C/C++
Catalan
Cebuano
Chamorro
Chechen
Chinese Simplified
Chinese Traditional
Chukchee
Chuvash
COBOL
Corsican
Crimean Tatar
Croatian
Crow
Czech
Dakota
Danish
Dargwa
Dungan
Dutch (Belgian)
Dutch
English
Eskimo (Cyrillic)
Eskimo (Latin)
Esperanto
Estonian
Even
Evenki
Faroese
Fijian
Finnish
Fortran
French
Frisian
Friulian
Gagauz
Galician
Ganda
German (Luxembourg)
German (new spelling)
German
Greek
Guarani
Hani
Hausa
Hawaiian
Hebrew
Hungarian
Icelandic
Ido
Indonesian
Ingush
Interlingua
Irish
Italian
Japanese
Java
Jingpo
Kabardian
Kalmyk
Karachay-Balkar
Karakalpak
Kasub
Kawa
Kazakh
Khakass
Khanty
Kikuyu
Kirghiz
Kongo
Korean (Hangul)
Korean
Koryak
Kpelle
Kumyk
Kurdish
Lak
Latin
Latvian
Lezgi
Lithuanian
Luba
Macedonian
Malagasy
Malay
Malinke
Maltese
Mansi
Maori
Mari
Maya
Miao
Minangkabau
Mohawk
Moldavian
Mongol
Mordvin
Nahuatl
Nenets
Nivkh
Nogay
Norwegian (Bokmal)
Norwegian (Nynorsk)
Nyanja
Occidental
Occitan
Ojibway
Ossetian
Papiamento
Pascal
Polish
Portuguese (Brazil)
Portuguese
Quechua
Rhaeto-Romance
Romanian
Romany
Rundi
Russian (old spelling)
Russian
Russian with accents
Rwanda
Sami (Lappish)
Samoan
Scottish Gaelic
Selkup
Serbian (Cyrillic, Latin)
Shona
Simple chemical formulas
Slovak
Slovenian
Somali
Sorbian
Sotho
Spanish
Sunda
Swahili
Swazi
Swedish
Tabasaran
Tagalog
Tahitian
Tajik
Tatar
Thai
Tok Pisin
Tongan
Tswana
Tun
Turkish
Turkmen (Cyrillic)
Turkmen (Latin)
Tuvinian
Udmurt
Uighur (Cyrillic, Latin)
Ukrainian
Uzbek (Cyrillic, Latin)
Vietnamese
Welsh
Wolof
Xhosa
Yakut
Yiddish
Zapotec
Zulu

Italics signify dictionary support.

Afaan Oromo
Afrikaans
Albanian
Arabic^{(PC Only)}
Asturian
Aymara
Azeri (Latin)
Balinese
Basque
Bemba
Bikol
Bislama
Bosnian (Cyrillic)
Bosnian (Latin)
Brazilian
Breton
Bulgarian
Bulgarian-English
Byelorussian
Byelorussian-English
Catalan
Cebuano
Chamorro
Chinese (Simplified)
Chinese (Traditional)
Corsican
Croatian
Czech
Danish
Dutch
English (UK)
English (USA)
Esperanto
Estonian
Faroese
Farsi^{(PC Only)}
Fijian
Finnish
French
Frisian
Friulian
Galician
Ganda
German
German (Switzerland)
Greek
Greek-English
Greenlandic
HaitianCreole
Hani
Hebrew
Hiligaynon
Hungarian
Icelandic
Ido
Ilocano
Indonesian
Interlingua
Irish (Gaelic)
Italian
Japanese
Javanese
Kapampangan
Kazakh^{(PC Only)}
Kicongo
Kinyarwanda
Korean
Kurdish
Latin
Latvian
Lithuanian
Luba
Luxemburg
Macedonian
Macedonian-English
Madurese
Malagasy
Malay
Manx (Gaelic)
Maori
Mayan
Mexican
Minangkabau
Moldovan
Mongolian (Cyrillic)^{(PC Only)}
Nahuatl
Norwegian
Numeric
Nyanja
Nynorsk
Occitan
Papiamento
PidginEnglish (Nigeria)
Polish
Portuguese
Quechua
Rhaeto-Roman
Romanian
Rundi
Russian
Russian-English
Samoan
Sardinian
Scottish (Gaelic)
Serbian
Serbian (Latin)
Serbian-English
Shona
Slovak
Slovenian
Somali
Sotho
Spanish
Sundanese
Swahili
Swedish
Tagalog
Tahitian
Tatar (Latin)
Tetum
TokPisin
Tonga
Tswana
Turkish
Turkmen (Latin)
Ukrainian
Ukrainian-English
Uzbek
Waray
Welsh
Wolof
Xhosa
Zapotec
Zulu

Afrikaans
Albanian
Aymara
Basque
Bemba
Blackfoot
Breton
Bugotu
Bulgarian
Byelorussian
Catalan
Chamorro
Chechen
Chinese (Simplified)
Chinese (Traditional)
Corsican
Croatian
Crow
Czech
Danish
Dutch
English
Esperanto
Estonian
Faroese
Fijian
Finnish
French
Frisian
Friulian
Gaelic (Irish)
Gaelic (Scottish)
Galician
Ganda/Luganda
German
Greek
Guarani
Hani
Hawaiian
Hungarian
Icelandic
Ido
Indonesian
Interlingua
Inuit
Italian
Japanese
Kabardian
Kasub
Kikuyu
Kongo
Korean
Kpelle
Kurdish
Latin
Latvian
Lituanian
Luba
Luxembourgian
Macedonian
Malagasy
Malay
Malinke
Maltese
Maori
Mayan
Miao
Minankabaw
Mohawk
Moldavian
Nahuatl
Norwegian
Nyanja
Occidental
Ojibway
Papiamento
Pidgin English
Polish
Portuguese
Portuguese (Brazilian)
Provencal
Quechua
Rhaetic
Romanian
Romany
Ruanda
Rundi
Russian
Sami
Sami Lule
Sami Northern
Sami Southern
Samoan
Sardinian
Serbian (Cyrillic)
Serbian (Latin)
Shona
Sioux
Slovak
Slovenian
Somali
Sorbian
Sotho
Spanish
Sundanese
Swahili
Swazi
Swedish
Tagalog
Tahitian
Tongan
Tswana
Tun
Turkish
Ukranian
Visayan
Wa
Welsh
Wolof
Xhosa
Zapotec
Zulu

Italics signify dictionary support.

Afrikaans

Amharic

Arabic

Assamese

Azerbaijani

Azerbaijani – Cyrillic

Belarusian

Bengali

Tibetan

Bosnian

Bulgarian

Catalan; Valencian

Cebuano

Czech

Chinese – Simplified

Chinese – Traditional

Cherokee

Welsh

Danish

German

Dzongkha

Greek, Modern (1453-)

English

English, Middle (1100-1500)

Esperanto

Estonian

Basque

Persian

Finnish

French

German Fraktur

French, Middle (ca. 1400-1600)

Irish

Galician

Greek, Ancient (-1453)

Gujarati

Haitian; Haitian Creole

Hebrew

Hindi

Croatian

Hungarian

Inuktitut

Indonesian

Icelandic

Italian

Italian – Old

Javanese

Japanese

Kannada

Georgian

Georgian – Old

Kazakh

Central Khmer

Kirghiz; Kyrgyz

Korean

Kurdish

Lao

Latin

Latvian

Lithuanian

Malayalam

Marathi

Macedonian

Maltese

Malay

Burmese

Nepali

Dutch; Flemish

Norwegian

Oriya

Panjabi; Punjabi

Polish

Portuguese

Pushto; Pashto

Romanian; Moldavian; Moldovan

Russian

Sanskrit

Sinhala; Sinhalese

Slovak

Slovenian

Spanish; Castilian

Spanish; Castilian – Old

Albanian

Serbian

Serbian – Latin

Swahili

Swedish

Syriac

Tamil

Telugu

Tajik

Tagalog

Thai

Tigrinya

Turkish

Uighur; Uyghur

Ukrainian

Urdu

Uzbek

Uzbek – Cyrillic

Vietnamese

Yiddish

Languages

There is a wide variety of OCR software available. While they all share the ability to convert images of machine printed (not handwritten) text or numbers into an editable format, the various software often have different features, accuracy, prices, and language options.

SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your basic version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.

All SimpleSoftware products have Tesseract 5 OCR languages support. You can learn more about it and download additional language libraries here. And you can check and add more OCR languages libraries supported with Tesseract on your station here:

C:\Program Files (x86)\SimpleIndex\Tesseract\v5.3.0\

SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:

Compare

SimpleView

IRIS Readiris 22 PDF Business

Remark Office OMR Software

Remark Test Grading

Donation to Support SimpleOCR Freeware

SDK – Command-Line Tool (single user license)

ABBYY FineReader PDF for Mac, 1 Year Subscription

Tungsten Kofax PaperPort Standard

ABBYY FineReader PDF 15 Standard, 1 Year Subscription

Tungsten Kofax PowerPDF – Standard

Tungsten Kofax PowerPDF – Standard for Mac

IRIS Readiris 22 PDF Standard

IRIS Readiris 23 PDF Standard

IRIS Readiris 23 PDF Standard for Mac

Tungsten Kofax OmniPage – Standard

ABBYY FineReader PDF 15 Corporate, 1 Year Subscription

Tungsten Kofax PowerPDF – Advanced

Tungsten Kofax PaperPort Professional

IRIS Readiris 23 PDF Business

IRIS Readiris 23 PDF Business for Mac

Tungsten Kofax OmniPage – Ultimate

SimpleIndex Standard

SimpleIndex OCR Workstation

SimpleIndex Professional

Title

Languages

There is a wide variety of OCR software available. While they all share the ability to convert images of machine printed (not handwritten) text or numbers into an editable format, the various software often have different features, accuracy, prices, and language options.

SimpleSoftware OCR engines are using two different systems for language support. In the end languages supported by your OCR is based on your basic version of SimpleIndex installed, any addons (SimpleIndex Server, SimpleCoversheet, and so on) do not add any additional language support.

All SimpleSoftware products have Tesseract 5 OCR languages support. You can learn more about it and download additional language libraries here. And you can check and add more OCR languages libraries supported with Tesseract on your station here:

C:\Program Files (x86)\SimpleIndex\Tesseract\v5.3.0\

SimpleIndex Pro and SimpleIndex OCR are using FineReader engine. It has one of the largest libraries of supported OCR languages. You can check OCR languages supported with FineReader on your station here:

Share This Story, Choose Your Platform!

Title