Me and Regex
what is regex?
Regex or Regular Expressions is not a programming language but more pattern identification. Its main purpose is quality check in translated or other texts and documents. According to Riccardo Schiaffino, RegEx "is a search-and-replace function on steroids. Regular expressions can assist our translation work by allowing us to search, replace, and filter text in ways that would otherwise be impossible in our software tools." (https://www.ata-chronicle.online/highlights/regular-expressions-an-introduction-for-translators/
If you are a linguist or have some affinity for languages, you will pick up regex quickly, after some trials and errors.:)
For us, translators, regex is important because CAT tools use regular expressions for creating segmentation and auto-translation rules.
See below my first attempts to create some basic rules that can be used for Hungarian translations.
RegEx for English to Hungarian Translations
Example 1: Hungarian (or other names) with more than 1 space between them
Regular Expression: [a-záéúőóüö.](\s\s+)[A-ZÁÉÚŐÓÜÖ]
Explanation: This regex looks for one or more spaces between words that follow each other with capital letters including Hungarian characters or common Latin characters. It is designed particularly for checking Hungarian and English proper names that contain 2 or more components. Note that the extra space between regular words (lower cases) was not picked up.
As you can see it, it picked up all the extra spaces between the names regardless of whether they contained 2 or more elements or a period between them. (I just realized, this regex can be used also to check if there is an extra space between sentences that end with a period including the ones that start with Hungarian letters which is super helpful and definitely broadens its usage!)
Example 2: English and other quotation marks replaced with Hungarian (lower and upper) quotation marks
Regular Expression: ("|'|<|>|‘|“)(.*)("|'|<|>|’|”)
Explanation: It's common to leave English upper quotation marks in translated texts simply because they don't have a direct way to put them into the text in Hungarian, but they are considered to be grammatically incorrect. This expression looks for segments that start or end with other than Hungarian lower and upper quotation marks including ", ', ‘, ’. “, ”, <, >. The replacement changes them to start with a lower quotation mark and ends with the upper quotation mark.
Note: The French quotation mark was not included because Hungarian uses them, too.