{"id":555,"date":"2014-11-14T16:36:26","date_gmt":"2014-11-14T16:36:26","guid":{"rendered":"https:\/\/copyright.lboro.ac.uk\/middleware\/?p=555"},"modified":"2014-11-14T16:40:50","modified_gmt":"2014-11-14T16:40:50","slug":"identifying-multibyte-utf-8-characters-in-postgresql","status":"publish","type":"post","link":"https:\/\/blog.lboro.ac.uk\/middleware\/blog\/databases\/identifying-multibyte-utf-8-characters-in-postgresql","title":{"rendered":"Identifying multibyte UTF-8 characters in PostgreSQL"},"content":{"rendered":"<p>This afternoon I had to find a quick way to identify which rows in a PostgreSQL table had multibyte UTF-8 characters in it. \u00a0Luckily PostgreSQL supports a number of <a href=\"http:\/\/www.postgresql.org\/docs\/9.3\/static\/functions-string.html\">string functions<\/a>\u00a0one of which is char_length, which returns the number of characters in a string. \u00a0Another one is octet_length which returns the number of bytes in a string. \u00a0For standard ASCII strings these will be the same but for any strings containing multibyte UTF-8 characters, these will differ. \u00a0Using these functions I ended up with some SQL based on the following query<\/p>\n<p><code>SELECT id, text_value FROM metadatavalue WHERE char_length(text_value)!=octet_length(text_value)<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This afternoon I had to find a quick way to identify which rows in a PostgreSQL table had multibyte UTF-8 characters in it. \u00a0Luckily PostgreSQL supports a number of string functions\u00a0one of which is char_length, which returns the number of &hellip; <a href=\"https:\/\/blog.lboro.ac.uk\/middleware\/blog\/databases\/identifying-multibyte-utf-8-characters-in-postgresql\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[38],"tags":[39,40],"class_list":["post-555","post","type-post","status-publish","format-standard","hentry","category-databases","tag-postgresql","tag-utf-8-characters"],"_links":{"self":[{"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/posts\/555","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/comments?post=555"}],"version-history":[{"count":2,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/posts\/555\/revisions"}],"predecessor-version":[{"id":557,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/posts\/555\/revisions\/557"}],"wp:attachment":[{"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/media?parent=555"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/categories?post=555"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.lboro.ac.uk\/middleware\/wp-json\/wp\/v2\/tags?post=555"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}