User Tools

Site Tools


format:syntax

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
format:syntax [2008/01/01 15:54]
dennis
format:syntax [2012/01/28 23:00] (current)
Line 1: Line 1:
 ====== Syntax ====== ====== Syntax ======
  
-//These are guidelines on what CEDICT entries **should** look like. CEDICT still has many old entries which do not comply to these rules yet.//+//These are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries which do not comply to these rules yet.//
  
 ===== Basic format ===== ===== Basic format =====
  
-The basic format of a CEDICT entry is:+The basic format of a CC-CEDICT entry is:
 <code> <code>
 Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/ Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/
Line 48: Line 48:
  
 A double width comma is used in the Chinese, a single width comma padded with spaces on both sides is used in the pinyin. A double width comma is used in the Chinese, a single width comma padded with spaces on both sides is used in the pinyin.
 +
  
 ===== Retroflex finals ===== ===== Retroflex finals =====
Line 60: Line 61:
   - 頭兒 头儿 [tou2 r5] /leader/   - 頭兒 头儿 [tou2 r5] /leader/
   - 花兒 花儿 [hua1 r5] /erhua variant of 花/flower/   - 花兒 花儿 [hua1 r5] /erhua variant of 花/flower/
 +
 +//Please note: words ending with 'r5' (such as 'hua1 r5') are presented as a -r joined with the previous syllable (eg. 'huar1') in some dictionaries using CC-CEDICT, such as the [[http://www.mdbg.net/chindict/chindict.php|MDBG Chinese-English dictionary]].//
 +
 +===== Taiwanese pronunciation =====
 +
 +CC-CEDICT follows "standard Mandarin" as used in P.R.China. Mandarin as used in Taiwan sometimes has slight variations in the pronunciation, these can be listed as follows:\\ 
 +叔叔 叔叔 [shu1 shu5] /(informal) father's younger brother/uncle/Taiwan pr. shu2 shu5/
 +
 +Taiwan doesn't use the light tone so, we do not list Taiwan pronunciations when they consist only of saying "don't use the light tone". When a character has a "Taiwan pr." notice, then all of its compound need not mention it.  
 +
  
 ===== General principles ===== ===== General principles =====
  
 Various trivial style things: Various trivial style things:
-  * Don't use parts of speech. Instead try to give an indication of grammatical usage within the English definition. CEDICT is a human readable descriptive dictionary, not a resource intended for machine processing.+  * Don't use parts of speech. Instead try to give an indication of grammatical usage within the English definition. CC-CEDICT is a human readable descriptive dictionary, not a resource intended for machine processing.
   * Abbreviations etc cf e.g. i.e. do not need any further punctuation.   * Abbreviations etc cf e.g. i.e. do not need any further punctuation.
   * Extended meanings indicated by lit. .. fig. combination when appropriate or when a common expression refers back to a classical incident or chengyu, one can refer to it with cf (incident in Records of the Historian).   * Extended meanings indicated by lit. .. fig. combination when appropriate or when a common expression refers back to a classical incident or chengyu, one can refer to it with cf (incident in Records of the Historian).
Line 70: Line 81:
 ===== Choice of entries and translations ===== ===== Choice of entries and translations =====
  
-The current CEDICT database contains a considerable number of infelicities, inaccuracies, omissions, and actual errors. As an ideal, new entries should be checked against 2 or 3 different sources (e.g. the online and paper dictionaries). Care is needed, since the dictionaries copy from one another -- an entirely bogus entry in CEDICT is copied uncritically onto thousands of websites within a few months.+The current CC-CEDICT database contains a considerable number of infelicities, inaccuracies, omissions, and actual errors. As an ideal, new entries should be checked against 2 or 3 different sources (e.g. the online and paper dictionaries). Care is needed, since the dictionaries copy from one another -- an entirely bogus entry in CC-CEDICT is copied uncritically onto thousands of websites within a few months.
  
-A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CEDICT, with translations corresponding to the main usages.+A Chinese word for which a Google query with the following syntax results in many thousand of hits should probably be added to CC-CEDICT, with translations corresponding to the main usages.
 <code> <code>
 +"combination of characters" +"combination of characters"
 </code> </code>
 //(the +"" combination forces Google to match both a whole word and to ignore variants)// //(the +"" combination forces Google to match both a whole word and to ignore variants)//
 +
  
 ===== General principles of translation ===== ===== General principles of translation =====
Line 84: Line 96:
 On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom). On the other hand, a translation always loses something, and the translator can compensate by substituting an English equivalent (e.g. a biblical or Shakespearian allusion in place of a Confucian idiom).
  
-Name of person should say dates if possible, what interest the person has (writer, general, pop star etc), brief indications of CV (e.g. took part in a revolution, was murdered, wrote famous book etc).+Names of persons should say dates if possible (birth, death, years in which the person was active in a certain role, etc), what interest the person has (writer, general, pop staretc), brief indications of CV (e.g. took part in a revolution, was murdered, wrote famous booketc). For example:\\ 胡錦濤 胡锦涛 [Hu2 Jin3 tao1] /Hu Jintao (1942-), president of PRC from 2003/
  
 Names of plants, animals, musical instruments should give common name and scientific name when appropriate; there is a particular problem of how specific the word is -- a plant may mean a minor variety within a species, or may refer to an entire taxonomic family. Different writers will use it to mean the common family, or the particular item of salad on their plate at present. Names of plants, animals, musical instruments should give common name and scientific name when appropriate; there is a particular problem of how specific the word is -- a plant may mean a minor variety within a species, or may refer to an entire taxonomic family. Different writers will use it to mean the common family, or the particular item of salad on their plate at present.
Line 91: Line 103:
  
 There are 20,000 Chinese characters in the more advanced dictionaries, of which many are obscure, never used, and will not have correct definitions in online or paper dictionaries. This is the boundary of knowledge. (Exactly the same applies to big English dictionaries.) These obscure characters appear on modern websites, and one sometimes needs to give a definition. It is reasonable to admit (precise meaning unknown), and give an indication of what one can deduce. There are 20,000 Chinese characters in the more advanced dictionaries, of which many are obscure, never used, and will not have correct definitions in online or paper dictionaries. This is the boundary of knowledge. (Exactly the same applies to big English dictionaries.) These obscure characters appear on modern websites, and one sometimes needs to give a definition. It is reasonable to admit (precise meaning unknown), and give an indication of what one can deduce.
 +
 +===== Ambiguity due to homonyms =====
 +
 +Sometimes words used in the English definitions can have multiple meanings. If the Chinese word does not have these additional meanings, additional information should be provided to prevent ambiguity:\\ 
 +首都 首都 [shou3 du1] /capital (city)/
 +
 +The text between the parentheses is "meta-information"; it is not a direct part of the translation, merely to prevent ambiguity. 
 +
 +===== References =====
 +
 +The English definitions can contain references to other Chinese words. These should be noted as follows:\\ 
 +漢字|汉字[Han4 zi4]
 +
 +For example:\\ 
 +股指 股指 [gu3 zhi3] /stock market index/share price index/abbr. for 股票指數|股票指数[gu3 piao4 zhi3 shu4]/
 +
 +===== Classifiers =====
 +
 +Classifiers (also called "Measure words") can be listed using the following syntax:\\ 
 +避風港 避风港 [bi4 feng1 gang3] /haven/refuge/harbor/CL:座[zuo4],個|个[ge4]/
 +
 +Classifiers follow the 'reference' syntax, are prefixed by 'CL:' and separated by a comma (no additional spacing).
 +
 +The classifier words itself can be described using:\\ 
 +/classifier for small round things (peas, bullets, peanuts, pills, grains etc)/
  
 ===== Variants ===== ===== Variants =====
  
-Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. It often happens that Google tells you that +"Xx" occurs 200 times more frequently than +"XX", in which case Xx should be in CEDICT as a regular entry, and XX only as "variant of Xx".+Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice of variants found in texts on websites will arise because of the different input methods, and the user may have had no intention of using the variant. 
 + 
 +You can get rough usage frequency information by searching the alternative word forms in Google. Please use this syntax to make sure that Google doesn't perform any automatic variant translations: 
 +<code>+"word"</code> 
 + 
 +Additionally you can use Google's advanced search to specify the language to either 'Chinese (Traditional)' or 'Chinese (Simplified)' to prevent Japanese web pages from influencing the results. For example:\\  
 +789 Chinese (Traditional) pages for +"撐竿跳高"\\  
 +17,700 Chinese (Simplified) pages for +"撑竿跳高"\\  
 +1,750 Chinese (Traditional) pages for +"撐杆跳高"\\  
 +66,900 Chinese (Simplified) pages for +"撑杆跳高" 
 + 
 +It often happens that Google tells you that +"Xx" occurs 200 times more frequently than +"XX", in which case Xx should be in CC-CEDICT as a regular entry, and XX only as "XX XX [pin1 yin1] /variant of Xx/definition/"
 + 
 +When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] /pole-vaulting/also written 撐杆跳高|撑杆跳高/.
  
 ===== Romanization of foreign languages ===== ===== Romanization of foreign languages =====
format/syntax.1199202848.txt.gz · Last modified: 2008/06/10 18:00 (external edit)