For text based PDFs, we pull the text directly from the file and many languages are supported. You will find the setting to select your preferred language upon creating your layout parser, or under your "Settings" for your layout parser as seen below. Please note you can set different languages for each layout parser if desired.
For OCR we support the following languages currently:
Afrikaans |
Albanian |
Basque |
Brazilian |
Bulgarian |
Byelorussian |
Catalan |
Chinese Simplified |
Chinese Traditional |
Croatian |
Czech |
Danish |
Dutch |
Esperanto |
Estonian |
Finnish |
French |
Galician |
German |
Greek |
Hungarian |
Icelandic |
Indonesian |
Italian |
Japanese |
Korean |
Latin |
Latvian |
Lithuanian |
Macedonian |
Malay |
Moldavian |
Norwegian |
Polish |
Portuguese |
Russian |
Serbian |
Slovak |
Slovenian |
Spanish |
Swedish |
Tagalog |
Turkish |
|