Login

View change log entry

Navigation:  ◀ 75133  75135 ▶ 

Change log entry 75134
Processed by: richwarm (2022-07-04 01:03:18 UTC)
Comment: << review queue entry 71043 - submitted by 'cws' >>
Proposing a syntax that signifies a word can also be read in its non-combined form

Generally, phrase such as 都会 often occurs in sentence as 'all will', not as metropolis

中的 is mostly referring to China's

把头 is mostly referring to someone's head, rather than 'labor contractor' or 'gangmaster'

https://context.reverso.net/翻译/中文-英语/把头

Most of the examples of 把头 on reverso's list are about someone's head rather than 'labor contractor' or gangmaster. Out of its hundred examples that list the use of 把头 as someone's head, I found only two examples on reverso that refers to 把头 as gangmaster, e.g.,

他们不会把头和小兵都抓起来
'They don't arrest the kingpin and his underlings.'
我们都把头剃了 看着会像光头党的
'When you get out of here, we'll all shave our heads and look like a gang of skinheads.'

Will add kingpin for 把头 on CC-CEDICT later

Dictionary software should optimize for common use

Having a mechanism in the CC-CEDICT database that signifies a word can be read in non-combined form, dictionary softwares can provide the language learner the list of possible interpretations of non-combined form, or just plainly include the meaning of each hanzi of the combined word

I would rather see 都会 be shown as this list (listing metropolis/city last)..

都 dōu
all; both; entirely; (used for emphasis) even; already; (not) at all
都 dū
capital city; metropolis
会 huì
can (i.e. have the skill, know how to); likely to; sure to; to meet; to get together; meeting; gathering; union;
group; association; a moment (Taiwan pr. for this sense is [hui3])
会 kuài
to balance an account; accountancy; accounting
都会都會 dū huì
city; metropolis

..than this (most browser extensions, e.g., Zhongwen):
都会都會 dū huì
city; metropolis
都 Dū
surname Du
都 dōu
all; both; entirely; (used for emphasis) even; already; (not) at all
都 dū
capital city; metropolis


The advantage of just appending slash is that it will not badly impact existing dictionary softwares (e.g., Zhongwen extension). Adjusting nothing on Zhongwen's code, it will render additional slash to 都会 as another semicolon:

都会都會 dū huì
city; metropolis;

For dictionary softwares that will take use of the additional information that a combined word can be read in its non-combined form, it can surely help the learners of Chinese to optimize how they read and learn Chinese

When the syntax is accepted, I'll help CC-CEDICT to add that information to CC-CEDICT's vocabularies, it's just adding slash anyway

Here are other list I collected that can be interpreted in its non-combined form:
米高
到了
家的
都会
都會
都會
中的
美的
得了
大树
大樹
面的
把門
把门
的话
的話
有了
那是
把头
把頭
才能

An aside, I asked my wife (Chinese) if she know the word 家的 (defined as 'old wife' in CC-CEDICT), she said she don't know the word 家的 as 'old wife'. Even my OS's pinyin input method does not list 家的 for jiade, it shows five other words that matches jiade (e.g., 假的, 架得). If the syntax will be accepted, for surely I will add slash to 家的, it mostly refers to *family's* anyway. For dictionary softwares that recognizes the additional syntax, 家的 will be rendered as:

家 jīa
home; family: (polite) my (sister, uncle etc); classifier for families or businesses; refers to the philosophical
schools of pre-Han China; noun suffix for a specialist in some activity, such as a musician or revolutionary,
corresponding to English -ist, -er, -ary or -ian;
的 de
of; ~'s (possessive particle); (used after an attribute); (used to form a nominal expression); (used at the end of
a declarative sentence for emphasis); also pr. [di4] or [di5] in poetry and songs
的 di
see 的士[ di1 shi4]
的 dí
really and truly
aim; clear
家的 jīa de
(old) wife

Listing *(old wife)* last
-----------------------------------------------

Editor:
1) Please note that if you want to contact us, you don't need to put in a submission. You can send an email by clicking on the envelope icon next to an editor's username.

2) > "中的 is mostly referring to China's"
No, that's incorrect.

3) I checked sentences in a corpus of magazine articles. Admittedly, I only looked at the first 25 instances of 都會, but I found that it meant "city" in the following 9 cases.

- 立足台北都會區、
- 甚至為了衝浪,遠離都會、搬到近海的頭城、
- 高雄都會公園
- 毋寧更像是一瓶帶著都會雅痞氣息的「海尼根」啤酒。
- 例如,「蜘蛛人」用手腕吐絲,在都會高樓裡側身擺盪移動,到處主持正義;
- 早年原住民行走於山野中的路徑,不是現代都會人所能想像的,
- 都遠比都會郊山的植栽展現更巨大的身形,
- 美國大都會博物館、
- (如肯邦國際、證券櫃台買賣中心、大都會人壽等)


4) When you look up "shiji" in Pleco, there are dozens of results. Naturally, users want the most common words to be displayed before the rare ones (just as you want to display the entries for 把 and 头 before the entry for 把头).

So, for example, the first result shown in Pleco is 实际 (reality), while 石鸡 (partridge) appears well down the list.

In order to display results in this way, does Pleco rely on data in the user's installed dictionaries? I presume the answer is no. I would assume that Pleco uses an auxiliary file that indicates the frequency of words, and that this file ranks 实际 higher than 石鸡.

Similarly, for your app, you can create an auxiliary file that indicates the frequency with which a 2-gram should be interpreted as a word.

For example, you could have a file containing a line such as
都會 0.36
(indicating that 都會 means "city" 36% of the time, or whatever figure you determine by examining a corpus)

If you did that, then
- You would not need to rely on CC-CEDICT for this information.
- We would not need to be involved with a matter that is outside our scope.
- We would not have to deal with users contacting us about a "superfluous slash at the end of the entry".
- Apps that use CC-CEDICT would not need to be modified to handle the new format.
Diff:
# - 都會 都会 [du1 hui4] /city/metropolis/
# + 都會 都会 [du1 hui4] /city/metropolis//
By MDBG 2025
Privacy and cookies
Help wanted: the CC-CEDICT project is looking for new volunteer editors!