format:syntax
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
format:syntax [2007/10/03 17:53] – dennis | format:syntax [2023/10/23 09:20] (current) – Clarify Taiwan neutral tone paragraph. skypher437 | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Syntax ====== | ====== Syntax ====== | ||
- | The basic format of a CEDICT entry is: | + | //These are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries which do not comply to these rules yet.// |
+ | |||
+ | ===== Basic format ===== | ||
+ | |||
+ | The basic format of a CC-CEDICT entry is: | ||
< | < | ||
- | Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/ | + | Traditional Simplified [pin1 yin1] /gloss; gloss; .../gloss; gloss; .../ |
</ | </ | ||
For example: | For example: | ||
< | < | ||
- | 中國 中国 | + | 皮實 皮实 |
</ | </ | ||
- | Additionally: | + | ==== Semicolons ==== |
+ | |||
+ | Note that senses are separated by a slash, while glosses for the same sense are separated by a semicolon. | ||
+ | |||
+ | The semicolon was used for this purpose in a small number of entries prior to 2022, but in most entries the slash has been used to separate both senses and glosses. However, as of April 2022, the intention is to reformat definitions, | ||
+ | |||
+ | ==== In addition: ==== | ||
* The Chinese word should consist of one or more Chinese characters, without any spaces in it | * The Chinese word should consist of one or more Chinese characters, without any spaces in it | ||
Line 26: | Line 36: | ||
* American English should be used for the English definitions | * American English should be used for the English definitions | ||
* Do not add definite or indefinite articles (e.g. " | * Do not add definite or indefinite articles (e.g. " | ||
+ | |||
+ | ===== Punctuation ===== | ||
+ | |||
+ | |||
+ | ==== Middle dot ==== | ||
+ | |||
+ | Middle dots are often used for separating western names: | ||
+ | 珍・奧斯汀 珍・奥斯汀 [Zhen1 · Ao4 si1 ting1] /Jane Austen (1775-1817), | ||
+ | |||
+ | A double width middle dot is used in the Chinese, a single width middle dot padded with spaces on both sides is used in the pinyin. | ||
+ | |||
+ | |||
+ | ==== Comma ==== | ||
+ | |||
+ | Commas are sometimes used in Chinese proverbs: | ||
+ | 人為財死,鳥為食亡 人为财死,鸟为食亡 [ren2 wei4 cai2 si3 , niao3 wei4 shi2 wang2] /Human beings die in pursuit of wealth, and birds die in pursuit of food/.../ | ||
+ | |||
+ | A double width comma is used in the Chinese, a single width comma padded with spaces on both sides is used in the pinyin. | ||
+ | |||
+ | |||
+ | ===== Retroflex finals ===== | ||
+ | |||
+ | There are 3 kinds of R-ised words that use the 兒/儿 character: | ||
+ | - 兒/儿 is not-optional because it's its own syllable (usually meaning " | ||
+ | - 兒/儿 is not-optional because it changes the definition of the word and is tacked on to the preceding syllable - 头兒/ | ||
+ | - 兒/儿 is an optional northern pronunciation (er2hua4) and is tacked on to the preceding syllable - 花兒/ | ||
+ | |||
+ | These 3 cases should be formatted as follows: | ||
+ | - 女兒 女儿 [nu:3 er2] /daughter/ | ||
+ | - 頭兒 头儿 [tou2 r5] /leader/ | ||
+ | - 花兒 花儿 [hua1 r5] /erhua variant of 花/flower/ | ||
+ | |||
+ | //Please note: words ending with ' | ||
+ | |||
+ | ===== Taiwanese pronunciation ===== | ||
+ | |||
+ | CC-CEDICT follows " | ||
+ | 叔叔 叔叔 [shu1 shu5] /(informal) father' | ||
+ | |||
+ | Taiwanese GuoYu sometimes prefers not to use the neutral tone, so we do not list Taiwan pronunciations when they consist only of saying " | ||
+ | |||
===== General principles ===== | ===== General principles ===== | ||
Various trivial style things: | Various trivial style things: | ||
- | * Don't use parts of speech. Instead try to give an indication of grammatical usage within the English definition. | + | * Don't use parts of speech. Instead try to give an indication of grammatical usage within the English definition. CC-CEDICT is a human readable descriptive dictionary, not a resource intended for machine processing. |
* Abbreviations etc cf e.g. i.e. do not need any further punctuation. | * Abbreviations etc cf e.g. i.e. do not need any further punctuation. | ||
* Extended meanings indicated by lit. .. fig. combination when appropriate or when a common expression refers back to a classical incident or chengyu, one can refer to it with cf (incident in Records of the Historian). | * Extended meanings indicated by lit. .. fig. combination when appropriate or when a common expression refers back to a classical incident or chengyu, one can refer to it with cf (incident in Records of the Historian). | ||
Line 36: | Line 87: | ||
===== Choice of entries and translations ===== | ===== Choice of entries and translations ===== | ||
- | The current | + | The current |
- | A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CEdict, with translations corresponding to the main usages. | + | A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CC-CEDICT, with translations corresponding to the main usages. |
< | < | ||
+" | +" | ||
</ | </ | ||
//(the +"" | //(the +"" | ||
+ | |||
===== General principles of translation ===== | ===== General principles of translation ===== | ||
Line 50: | Line 102: | ||
On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom). | On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom). | ||
- | Name of person | + | Names of persons |
Names of plants, animals, musical instruments should give common name and scientific name when appropriate; | Names of plants, animals, musical instruments should give common name and scientific name when appropriate; | ||
Line 57: | Line 109: | ||
There are 20,000 Chinese characters in the more advanced dictionaries, | There are 20,000 Chinese characters in the more advanced dictionaries, | ||
+ | |||
+ | ===== Ambiguity due to homonyms ===== | ||
+ | |||
+ | Sometimes words used in the English definitions can have multiple meanings. If the Chinese word does not have these additional meanings, additional information should be provided to prevent ambiguity: | ||
+ | 首都 首都 [shou3 du1] /capital (city)/ | ||
+ | |||
+ | The text between the parentheses is " | ||
+ | |||
+ | ===== References ===== | ||
+ | |||
+ | The English definitions can contain references to other Chinese words. These should be noted as follows: | ||
+ | 漢字|汉字[Han4 zi4] | ||
+ | |||
+ | For example: | ||
+ | 股指 股指 [gu3 zhi3] /stock market index/share price index/abbr. for 股票指數|股票指数[gu3 piao4 zhi3 shu4]/ | ||
+ | |||
+ | ===== Classifiers ===== | ||
+ | |||
+ | Classifiers (also called " | ||
+ | 避風港 避风港 [bi4 feng1 gang3] / | ||
+ | |||
+ | Classifiers follow the ' | ||
+ | |||
+ | The classifier words itself can be described using: | ||
+ | /classifier for small round things (peas, bullets, peanuts, pills, grains etc)/ | ||
===== Variants ===== | ===== Variants ===== | ||
- | Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. It often happens that Google tells you that +" | + | Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. |
+ | |||
+ | You can get rough usage frequency information by searching the alternative word forms in Google. Please use this syntax to make sure that Google doesn' | ||
+ | < | ||
+ | |||
+ | Additionally you can use Google' | ||
+ | 789 Chinese (Traditional) pages for +" | ||
+ | 17,700 Chinese (Simplified) pages for +" | ||
+ | 1,750 Chinese (Traditional) pages for +" | ||
+ | 66,900 Chinese (Simplified) pages for +" | ||
+ | |||
+ | It often happens that Google tells you that +" | ||
+ | |||
+ | When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] / | ||
+ | |||
+ | ===== Romanization of foreign languages ===== | ||
+ | When transcribing foreign words in definitions, | ||
+ | * Japanese: [[http:// | ||
+ | * Korean: [[http:// | ||
+ | If an alternative romanization method is more popular for a certain word, that version can be added as an additional translation. |
format/syntax.1191434015.txt.gz · Last modified: 2008/06/10 18:00 (external edit)