Why `isAlpha` can parse some non-alphabetic unicode characters like Chinese?

WendaoLee · December 3, 2023, 6:34am

Refer to the doc Data.Char,is said:

isAlpha :: Char → Bool
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters).

But when I test,it can parse non-alphabetic characters like Chinese.

isAlpha '你'
True

For my understanding,alphabetic characters is specified to the language which are Alphabetic Writing System,but Chinese is Logographic Writing System.Is the alphabetic means the characters which belong to a language having registered in the Unicode Standard?

(I’m sorry it seems like a linguistics question.

ocramz · December 3, 2023, 6:45am

That’s a great question instead. The documentation should be more specific about this.

Probie · December 3, 2023, 7:25am

I think the documentation here needs some work - this function checks if something is a “letter”, and these use of the word “alphabetic” is somewhere between slightly unhelpful and outright incorrect. In colloquial English, it’s not uncommon to refer to a writing system as an alphabet, even if it’s not actually an alphabet, which is probably why this isn’t considered incorrect.

The important part of the documentation is the bit in parentheses, although it does require some familiarity with unicode to recognise what it is saying.

These are the unicode general character classes Ll, Lu, Lt, Lo and Lm (in that order). “你” belongs to the Lo class, which is why isAlpha '你' is true.

WendaoLee · December 3, 2023, 7:29am

Thanks for your reply!

It helps a lot.

WendaoLee · December 3, 2023, 8:45am

Well,I want to submit issue to GHC’s team.But my account has been being pending approval.Could someone submit this question else?

Also,I have found a typo in Text.ParserCombinators.hs line 272:
Succeeds iff we are at the end of input

Thanks.

tomjaguarpaw · December 3, 2023, 8:49am

I have found a typo

Do you mean “iff”? That is a word that is short for “if and only if”.

my account has been being pending approval

@chreekat @bgamari

WendaoLee · December 3, 2023, 9:04am

Ah,I’m sorry.I don’t know.

tomjaguarpaw · December 3, 2023, 9:05am

That’s OK! No problem at all.

chreekat · December 4, 2023, 8:58am

Thanks for the ping! @WendaoLee 's account has been approved now.

Topic		Replies	Views
[ANN] unicode-data-0.3.0: APIs to efficiently access the Unicode character database	13	1258	April 11, 2022
[Solved]Question about how `show` deal with non-English character Learn	3	795	November 15, 2023
Possible alex bug Learn	3	535	January 23, 2022
When does the "Not pedantically Unicode-correct case" occur when using String? Learn	2	428	March 11, 2024
Can unicode symbols be use as operator and where can I find list of charactors that are consider symbols? (That can use as operator, e.g. < + & * ...) Learn	2	490	September 15, 2019

Why `isAlpha` can parse some non-alphabetic unicode characters like Chinese?

Related topics