format:syntax
Differences
This shows you the differences between two versions of the page.
Previous revision | |||
format:syntax [2008/01/01 15:54] – dennis | — | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Syntax ====== | ||
- | //These are guidelines on what CEDICT entries **should** look like. CEDICT still has many old entries which do not comply to these rules yet.// | ||
- | |||
- | ===== Basic format ===== | ||
- | |||
- | The basic format of a CEDICT entry is: | ||
- | < | ||
- | Traditional Simplified [pin1 yin1] /English equivalent 1/ | ||
- | </ | ||
- | |||
- | For example: | ||
- | < | ||
- | 中國 中国 [Zhong1 guo2] / | ||
- | </ | ||
- | |||
- | Additionally: | ||
- | |||
- | * The Chinese word should consist of one or more Chinese characters, without any spaces in it | ||
- | * The Mandarin pinyin should follow in the format below: | ||
- | * It should have a space between each pinyin syllable | ||
- | * Each pinyin syllable should have a tone number. Use 5 for the light tone (e.g. ni3 hao3 ma5) | ||
- | * Raw tones should be used: | ||
- | * Tone sandhi is **not** indicated (e.g., ni3 hao3 is not changed to ni2 hao3) | ||
- | * Although " | ||
- | * Word-related changes to neutral tone, however, **are** indicated. These are especially common with reduplicated forms (e.g., use ma1 ma5, not ma1 ma1; ba4 ba5, not ba4 ba4; kan4 kan5, not kan4 kan4; xiang3 xiang5 ("take under consideration" | ||
- | * For pinyin that uses the ü, represent it with a u followed by a colon (e.g. nu:3 ren2) | ||
- | * Capitalize pinyin for proper nouns (e.g. **B**ei3 jing1) | ||
- | * The English definitions should be separated with the '/' | ||
- | * American English should be used for the English definitions | ||
- | * Do not add definite or indefinite articles (e.g. " | ||
- | |||
- | ===== Punctuation ===== | ||
- | |||
- | |||
- | ==== Middle dot ==== | ||
- | |||
- | Middle dots are often used for separating western names: | ||
- | 珍・奧斯汀 珍・奥斯汀 [Zhen1 · Ao4 si1 ting1] /Jane Austen (1775-1817), | ||
- | |||
- | A double width middle dot is used in the Chinese, a single width middle dot padded with spaces on both sides is used in the pinyin. | ||
- | |||
- | |||
- | ==== Comma ==== | ||
- | |||
- | Commas are sometimes used in Chinese proverbs: | ||
- | 人為財死,鳥為食亡 人为财死,鸟为食亡 [ren2 wei4 cai2 si3 , niao3 wei4 shi2 wang2] /Human beings die in pursuit of wealth, and birds die in pursuit of food/.../ | ||
- | |||
- | A double width comma is used in the Chinese, a single width comma padded with spaces on both sides is used in the pinyin. | ||
- | |||
- | ===== Retroflex finals ===== | ||
- | |||
- | There are 3 kinds of R-ised words that use the 兒/儿 character: | ||
- | - 兒/儿 is not-optional because it's its own syllable (usually meaning " | ||
- | - 兒/儿 is not-optional because it changes the definition of the word and is tacked on to the preceding syllable - 头兒/ | ||
- | - 兒/儿 is an optional northern pronunciation (er2hua4) and is tacked on to the preceding syllable - 花兒/ | ||
- | |||
- | These 3 cases should be formatted as follows: | ||
- | - 女兒 女儿 [nu:3 er2] /daughter/ | ||
- | - 頭兒 头儿 [tou2 r5] /leader/ | ||
- | - 花兒 花儿 [hua1 r5] /erhua variant of 花/flower/ | ||
- | |||
- | ===== General principles ===== | ||
- | |||
- | Various trivial style things: | ||
- | * Don't use parts of speech. Instead try to give an indication of grammatical usage within the English definition. CEDICT is a human readable descriptive dictionary, not a resource intended for machine processing. | ||
- | * Abbreviations etc cf e.g. i.e. do not need any further punctuation. | ||
- | * Extended meanings indicated by lit. .. fig. combination when appropriate or when a common expression refers back to a classical incident or chengyu, one can refer to it with cf (incident in Records of the Historian). | ||
- | |||
- | ===== Choice of entries and translations ===== | ||
- | |||
- | The current CEDICT database contains a considerable number of infelicities, | ||
- | |||
- | A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CEDICT, with translations corresponding to the main usages. | ||
- | < | ||
- | +" | ||
- | </ | ||
- | //(the +"" | ||
- | |||
- | ===== General principles of translation ===== | ||
- | |||
- | The English should be meaningful, not horribly ugly, and bear a close relation to the Chinese meaning. It should correspond to something that could be used naturally by an English speaker (I think Arthur Waley has some advice saying that just because a text is about magnetohydrodynamics, | ||
- | |||
- | On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom). | ||
- | |||
- | Name of person should say dates if possible, what interest the person has (writer, general, pop star etc), brief indications of CV (e.g. took part in a revolution, was murdered, wrote famous book etc). | ||
- | |||
- | Names of plants, animals, musical instruments should give common name and scientific name when appropriate; | ||
- | |||
- | Most words have more than one meaning, and more than one grammatical function. Care is needed not to concentrate only on a specific occurrence to the exclusion of others. e.g . the actual occurrence may be a verb in the past participle (say " | ||
- | |||
- | There are 20,000 Chinese characters in the more advanced dictionaries, | ||
- | |||
- | ===== Variants ===== | ||
- | |||
- | Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. It often happens that Google tells you that +" | ||
- | |||
- | ===== Romanization of foreign languages ===== | ||
- | |||
- | When transcribing foreign words in definitions, | ||
- | * Japanese: [[http:// | ||
- | * Korean: [[http:// | ||
- | |||
- | If an alternative romanization method is more popular for a certain word, that version can be added as an additional translation. |
format/syntax.txt · Last modified: 2023/10/23 09:20 by skypher437