User Tools

Site Tools


syntax_v2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
syntax_v2 [2025/05/31 05:59] – created mdbgsyntax_v2 [2025/10/17 06:32] (current) mdbg
Line 1: Line 1:
 ====== CC-CEDICT V2 Syntax ====== ====== CC-CEDICT V2 Syntax ======
  
-//**TODO:** work in progress!//+CC-CEDICT began adopting the v2 format in December 2023. For v1 syntax, see: [[syntax]]. 
 + 
 +// Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet. //
  
 Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, "er4ci4 fang1cheng2" (i.e., èrcì fāngchéng), rather than as four separate syllables, "er4 ci4 fang1 cheng2", as was required in v1. Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, "er4ci4 fang1cheng2" (i.e., èrcì fāngchéng), rather than as four separate syllables, "er4 ci4 fang1 cheng2", as was required in v1.
  
-Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet. +An entry is considered to be in v2 format if it uses double square brackets for the pinyin. v1 entries use a single square bracket.
- +
-In particular: +
-  - Prior to April 2022, glosses and senses were separated using a /. As of April 2022, senses are to be separated with a / while glosses are to be separated with a ;. (This was a change in v1 format of definitions, but its progressive introduction largely coincides with the conversion of pinyin to v2 format.) +
-  - In December 2023, CC-CEDICT began adopting the v2 pinyin format. +
- +
-An entry is considered to be in v2 format if it uses double square brackets for the pinyin. +
 <code> <code>
-[[pin1yin1]] rather than [pin1 yin1]+v2: [[pin1yin1]] 
 +v1: [pin1 yin1]
 </code> </code>
  
 However, when updating the pinyin of an entry, the rest of the entry should also be reviewed. If this is done, it means that v2 pinyin format signifies not only that the pinyin format has been updated, but also that the definition has been checked for correctness and proper format: it's a way of keeping track of which entries have old definitions that need to be reviewed. However, when updating the pinyin of an entry, the rest of the entry should also be reviewed. If this is done, it means that v2 pinyin format signifies not only that the pinyin format has been updated, but also that the definition has been checked for correctness and proper format: it's a way of keeping track of which entries have old definitions that need to be reviewed.
  
-As of May 2025roughly 15% of entries had been converted to v2 by editors+In particular, prior to April 2022, glosses and senses were separated using a /. As of April 2022, senses are to be separated with a / while glosses are to be separated with a ;. (This was a change in v1 format of definitionsbut its progressive introduction largely coincides with the conversion of pinyin to v2 format.)
  
 Three editions of CC-CEDICT are published regularly: Three editions of CC-CEDICT are published regularly:
Line 163: Line 160:
 ==== References ==== ==== References ====
  
-The English definitions can contain references to other Chinese words. These should be noted as follows:\\  +See [[references]]
-漢字|汉字[Han4 zi4] +
- +
-For example:\\  +
-股指 股指 [gu3 zhi3/stock market index/share price index/abbr. for 股票指數|股票指数[gu3 piao4 zhi3 shu4]/+
  
 ==== Classifiers ==== ==== Classifiers ====
Line 181: Line 174:
 ==== Bound forms ==== ==== Bound forms ====
  
-A bound form is a morpheme that only appears as part of a larger expression. In English, bound forms tend to be prefixes or suffixes such as “-ly”, “-est”, “pre-”, “post-” etc and generally are not words by themselves. In Chinese however, characters can either be bound or free, and it can be difficult to determine which. Some characters can have multiple bound and multiple free senses.+See [[bound_forms]]
  
-There are two types of bound forms in Chinese, those with meanings and those without.+===== Punctuation =====
  
-=== Meaningful bound forms === 
  
-These are bound forms where a meaning can be assigned to the character. Oftentimes they appear in multiple words with the same meaning, but never by themselves. We indicate these by prefixing the sense with “(bound form)”.+==== Middle dot ====
  
-For instance:+Middle dots are often used for separating western names:
  
 <code> <code>
-隘 隘 [[ai4]] /(bound formnarrow/(bound form) a defile; a narrow pass/+大衛·艾登堡 大卫·艾登堡 [[Da4wei4 Ai4deng1bao3]] /David Attenborough (1926–), British naturalist and broadcaster/
 </code> </code>
  
-is a bound form as you would not see 隘 alone when reading Chinese. It would always be accompanied by other characters such as 隘口, 隘路, 关隘, 狭隘 etc. +Note: A middle dot was present within the pinyin in v1but no longer used in v2The v2 pinyin format allows us to clearly group the characters of the first name and last nameso the middle dot is no longer necessary.
- +
-=== Meaningless bound forms === +
- +
-These are bound forms where a meaning cannot be assigned to the character, usually because the character appears in a small number of words (usually just 1). Oftentimes these are the names of plants or animalsor terms used in literatureFor these characters, the entry is simply “used in …”. +
- +
-For example: +
- +
-<code> +
-鮟 𩽾 [an1] /used in 鮟鱇|𩽾𩾌[an1 kang1]/Taiwan pr. [an4]/ +
-鱇 𩾌 [[kang1]] /used in 鮟鱇|𩽾𩾌[an1kang1]/ +
-鮟鱇 𩽾𩾌 [an1 kang1] /anglerfish/ +
-</code> +
- +
-𩽾 and 𩾌 by themselves have no meaning, as they are always used with each other. 𩽾𩾌 is the anglerfish. +
- +
-A small number of meaningless bound forms are used in multiple words, in this case, all should be listed. When the words have the same or similar meaning, they should be combined into one sense, when the words have different meanings, they should be separated into different senses. +
- +
-Different senses +
- +
-<code> +
-螞 蚂 [[ma3]] /used in 螞蟥|蚂蟥[ma3huang2]/used in 螞蟻|蚂蚁[ma3yi3]/ +
-蝲 蝲 [la4] /used in 蝲蛄[la4 gu3]/used in 蝲蝲蛄[la4 la4 gu3]/ +
-蛞 蛞 [[kuo4]] /used in 蛞螻|蛞蝼[kuo4lou2]/used in 蛞蝓[kuo4yu2]/ +
-猻 狲 [[sun1]] /used in 猢猻|猢狲[hu2sun1]/used in 兔猻|兔狲[tu4sun1]/ +
-</code> +
- +
-Same sense +
-<code> +
-箢 箢 [yuan1] /used in 箢箕[yuan1 ji1] and 箢篼[yuan1 dou1]/Taiwan pr. [wan3]/ +
-癔 癔 [[yi4]] /used in 癔病[yi4bing4] and 癔症[yi4zheng4]/ +
-咐 咐 [[fu4]] /used in 吩咐[fen1fu5] and 囑咐|嘱咐[zhu3fu5]/ +
-</code> +
- +
-An example of both +
- +
-<code> +
-螂 螂 [[lang2]] /used in 螞螂|蚂螂[ma1lang2]/used in 蜣螂[qiang1lang2] and 虼螂[ge4lang2]/used in 螳螂[tang2lang2]/used in 蟑螂[zhang1lang2]/ +
-</code> +
- +
- +
- +
-===== Punctuation ===== +
- +
- +
-==== Middle dot ==== +
- +
-Middle dots are often used for separating western names:\\   +
-珍・奧斯汀 珍・奥斯汀 [Zhen1 · Ao4 si1 ting1] /Jane Austen (1775-1817)English novelist/ +
- +
-A double width middle dot is used in the Chinese, a single width middle dot padded with spaces on both sides is used in the pinyin. +
  
 ==== Comma ==== ==== Comma ====
  
 Commas are sometimes used in Chinese proverbs: Commas are sometimes used in Chinese proverbs:
 +
 <code> <code>
-人為財死鳥為食亡 人为财死鸟为食亡 [[ren2 wei4 cai2 si3niao3 wei4 shi2 wang2]] /Human beings die in pursuit of wealth, and birds die in pursuit of food/.../+分久必合合久必分 分久必合合久必分 [[fen1jiu3-bi4he2he2jiu3-bi4fen1]] /lit. that which is long divided must unify, and that which is long unified must divide (proverb, from Romance of the Three Kingdoms 三國演義|三国演义[San1guo2 Yan3yi4])/figthings are constantly changing/
 </code> </code>
  
-A **double width comma** is used in the Chinese. In the pinyin, **a single width comma followed by a space** is used.+The comma within the Chinese characters should be the "fullwidth comma":The comma within the pinyin should be the regular comma followed by a space.
  
  
syntax_v2.1748671162.txt.gz · Last modified: by mdbg

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki