User Tools

Site Tools


format:syntax

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
format:syntax [2009/03/01 13:40] dennisformat:syntax [2023/10/23 09:20] (current) – Clarify Taiwan neutral tone paragraph. skypher437
Line 7: Line 7:
 The basic format of a CC-CEDICT entry is: The basic format of a CC-CEDICT entry is:
 <code> <code>
-Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/+Traditional Simplified [pin1 yin1] /gloss; gloss; .../gloss; gloss; .../
 </code> </code>
  
 For example: For example:
 <code> <code>
-中國 中国 [Zhong1 guo2] /China/Middle Kingdom/+皮實 皮实 [pi2 shi5] /(of things) durable/(of people) sturdy; tough/
 </code> </code>
  
-Additionally:+==== Semicolons ==== 
 + 
 +Note that senses are separated by a slash, while glosses for the same sense are separated by a semicolon.  
 + 
 +The semicolon was used for this purpose in a small number of entries prior to 2022, but in most entries the slash has been used to separate both senses and glosses. However, as of April 2022, the intention is to reformat definitions, especially definitions with many glosses, using semicolons where appropriate. 
 + 
 +==== In addition====
  
   * The Chinese word should consist of one or more Chinese characters, without any spaces in it   * The Chinese word should consist of one or more Chinese characters, without any spaces in it
Line 63: Line 69:
  
 //Please note: words ending with 'r5' (such as 'hua1 r5') are presented as a -r joined with the previous syllable (eg. 'huar1') in some dictionaries using CC-CEDICT, such as the [[http://www.mdbg.net/chindict/chindict.php|MDBG Chinese-English dictionary]].// //Please note: words ending with 'r5' (such as 'hua1 r5') are presented as a -r joined with the previous syllable (eg. 'huar1') in some dictionaries using CC-CEDICT, such as the [[http://www.mdbg.net/chindict/chindict.php|MDBG Chinese-English dictionary]].//
 +
 +===== Taiwanese pronunciation =====
 +
 +CC-CEDICT follows "standard Mandarin" as used in P.R.China. Mandarin as used in Taiwan sometimes has slight variations in the pronunciation, these can be listed as follows:\\ 
 +叔叔 叔叔 [shu1 shu5] /(informal) father's younger brother/uncle/Taiwan pr. shu2 shu5/
 +
 +Taiwanese GuoYu sometimes prefers not to use the neutral tone, so we do not list Taiwan pronunciations when they consist only of saying "don't use the neutral tone". When a character has a "Taiwan pr." notice, then all of its compound need not mention it.  
 +
  
 ===== General principles ===== ===== General principles =====
Line 96: Line 110:
 There are 20,000 Chinese characters in the more advanced dictionaries, of which many are obscure, never used, and will not have correct definitions in online or paper dictionaries. This is the boundary of knowledge. (Exactly the same applies to big English dictionaries.) These obscure characters appear on modern websites, and one sometimes needs to give a definition. It is reasonable to admit (precise meaning unknown), and give an indication of what one can deduce. There are 20,000 Chinese characters in the more advanced dictionaries, of which many are obscure, never used, and will not have correct definitions in online or paper dictionaries. This is the boundary of knowledge. (Exactly the same applies to big English dictionaries.) These obscure characters appear on modern websites, and one sometimes needs to give a definition. It is reasonable to admit (precise meaning unknown), and give an indication of what one can deduce.
  
 +===== Ambiguity due to homonyms =====
 +
 +Sometimes words used in the English definitions can have multiple meanings. If the Chinese word does not have these additional meanings, additional information should be provided to prevent ambiguity:\\ 
 +首都 首都 [shou3 du1] /capital (city)/
 +
 +The text between the parentheses is "meta-information"; it is not a direct part of the translation, merely to prevent ambiguity. 
 +
 +===== References =====
 +
 +The English definitions can contain references to other Chinese words. These should be noted as follows:\\ 
 +漢字|汉字[Han4 zi4]
 +
 +For example:\\ 
 +股指 股指 [gu3 zhi3] /stock market index/share price index/abbr. for 股票指數|股票指数[gu3 piao4 zhi3 shu4]/
 +
 +===== Classifiers =====
 +
 +Classifiers (also called "Measure words") can be listed using the following syntax:\\ 
 +避風港 避风港 [bi4 feng1 gang3] /haven/refuge/harbor/CL:座[zuo4],個|个[ge4]/
  
 +Classifiers follow the 'reference' syntax, are prefixed by 'CL:' and separated by a comma (no additional spacing).
  
 +The classifier words itself can be described using:\\ 
 +/classifier for small round things (peas, bullets, peanuts, pills, grains etc)/
  
 ===== Variants ===== ===== Variants =====
Line 112: Line 148:
 66,900 Chinese (Simplified) pages for +"撑杆跳高" 66,900 Chinese (Simplified) pages for +"撑杆跳高"
  
-It often happens that Google tells you that +"Xx" occurs 200 times more frequently than +"XX", in which case Xx should be in CC-CEDICT as a regular entry, and XX only as "/variant of Xx/".+It often happens that Google tells you that +"Xx" occurs 200 times more frequently than +"XX", in which case Xx should be in CC-CEDICT as a regular entry, and XX only as "XX XX [pin1 yin1] /variant of Xx/definition/".
  
 When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] /pole-vaulting/also written 撐杆跳高|撑杆跳高/. When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] /pole-vaulting/also written 撐杆跳高|撑杆跳高/.
format/syntax.1235914842.txt.gz · Last modified: 2009/03/01 14:40 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki