Identifying multibyte UTF-8 characters in PostgreSQL

This afternoon I had to find a quick way to identify which rows in a PostgreSQL table had multibyte UTF-8 characters in it. Luckily PostgreSQL supports a number of string functions one of which is char_length, which returns the number of characters in a string. Another one is octet_length which returns the number of bytes in a string. For standard ASCII strings these will be the same but for any strings containing multibyte UTF-8 characters, these will differ. Using these functions I ended up with some SQL based on the following query

SELECT id, text_value FROM metadatavalue WHERE char_length(text_value)!=octet_length(text_value)

Middleware