Python Regex strange behavior with accented characters
I was experimenting with some Python (2.7.3) regex and I came across this
behavior which I did not expect. In this block of code here, the following
will return False when checking against the "ß" character as well as other
accented characters like "Å", "Í", etc.
In addition to returning False for the "ø" character, it will also return
False with other accented characters such as "å", "Å", "ç", "Ç", "Â", etc.
Case and point, I'm not sure where the problem stems from when dealing
with accented characters versus other characters like "¥", which it has no
problem with. They all have different unicode/utf-8 values (which is what
my encoding is set to), so I'm not sure where the difference lies.
def regex_check(name)
pattern = '[^ß]'
if re.match(pattern, str(name), re.IGNORECASE):
return True
else:
return False
print regex_check("ø")
Am I missing something obvious? Thanks for the help.
No comments:
Post a Comment