This is an old revision of the document!
These are guidelines on what CEDICT entries should look like. CEDICT still has many old entries which do not comply to these rules yet.
The basic format of a CEDICT entry is:
Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/
中國 中国 [Zhong1 guo2] /China/Middle Kingdom/
Middle dots are often used for separating western names:
珍・奧斯汀 珍・奥斯汀 [Zhen1 · Ao4 si1 ting1] /Jane Austen (1775-1817), English novelist/
A double width middle dot is used in the Chinese, a single width middle dot padded with spaces on both sides is used in the pinyin.
Commas are sometimes used in Chinese proverbs:
人為財死，鳥為食亡 人为财死，鸟为食亡 [ren2 wei4 cai2 si3 , niao3 wei4 shi2 wang2] /Human beings die in pursuit of wealth, and birds die in pursuit of food/…/
A double width comma is used in the Chinese, a single width comma padded with spaces on both sides is used in the pinyin.
There are 3 kinds of R-ised words that use the 兒/儿 character:
These 3 cases should be formatted as follows:
Various trivial style things:
The current CEDICT database contains a considerable number of infelicities, inaccuracies, omissions, and actual errors. As an ideal, new entries should be checked against 2 or 3 different sources (e.g. the online and paper dictionaries). Care is needed, since the dictionaries copy from one another – an entirely bogus entry in CEDICT is copied uncritically onto thousands of websites within a few months.
A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CEDICT, with translations corresponding to the main usages.
+"combination of characters"
(the +“” combination forces Google to match both a whole word and to ignore variants)
The English should be meaningful, not horribly ugly, and bear a close relation to the Chinese meaning. It should correspond to something that could be used naturally by an English speaker (I think Arthur Waley has some advice saying that just because a text is about magnetohydrodynamics, it doesn't follow that it has to be horribly ugly).
On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom).
Name of person should say dates if possible, what interest the person has (writer, general, pop star etc), brief indications of CV (e.g. took part in a revolution, was murdered, wrote famous book etc).
Names of plants, animals, musical instruments should give common name and scientific name when appropriate; there is a particular problem of how specific the word is – a plant may mean a minor variety within a species, or may refer to an entire taxonomic family. Different writers will use it to mean the common family, or the particular item of salad on their plate at present.
Most words have more than one meaning, and more than one grammatical function. Care is needed not to concentrate only on a specific occurrence to the exclusion of others. e.g . the actual occurrence may be a verb in the past participle (say “overthrown”) whereas the word may also mean “destruction”, “to topple” etc.
There are 20,000 Chinese characters in the more advanced dictionaries, of which many are obscure, never used, and will not have correct definitions in online or paper dictionaries. This is the boundary of knowledge. (Exactly the same applies to big English dictionaries.) These obscure characters appear on modern websites, and one sometimes needs to give a definition. It is reasonable to admit (precise meaning unknown), and give an indication of what one can deduce.
Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. It often happens that Google tells you that +“Xx” occurs 200 times more frequently than +“XX”, in which case Xx should be in CEDICT as a regular entry, and XX only as “variant of Xx”.
When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] /pole-vaulting/also written 撐杆跳高|撑杆跳高/.