User Tools

Site Tools


format:syntax_v2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
format:syntax_v2 [2024/08/30 21:26] – [Variants] richwarmformat:syntax_v2 [2024/09/12 23:02] (current) – [CC-CEDICT V2 Syntax] richwarm
Line 3: Line 3:
 //**TODO:** work in progress!// //**TODO:** work in progress!//
  
-Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, "er4ci4 fang1cheng2" (i.e., èrcì fāngchéng), rather than as four distinct syllables, "er4 ci4 fang1 cheng2", as was required in v1.+Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, "er4ci4 fang1cheng2" (i.e., èrcì fāngchéng), rather than as four separate syllables, "er4 ci4 fang1 cheng2", as was required in v1.
  
 Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet. Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet.
Line 62: Line 62:
 K人 K人 [K ren2] /(slang) to hit sb; to beat sb/ K人 K人 [K ren2] /(slang) to hit sb; to beat sb/
 </code> </code>
 +
 +**Below are some notes on how these entries are handled in v2.**
 +
 +Let's take "e人" (extroverted person) as an example.
 +
 +There are several ways one might like to render "e人" in pinyin, such as
 +  - e-rén
 +  - erén
 +  - yìrén
 +
 +The Editor website attempts to match the parts of the headword with the parts of the pinyin, and will, if necessary, treat some parts as "unparsed".
 +
 +For example, in the following entry, "e" is an unparsed element in both the headword and the pinyin, while 人 is matched with "ren2"
 +<code> e人 e人 [[e-ren2]] /(slang) extroverted person/ </code>
 +
 +If the Editor website https://cc-cedict.org/editor/ cannot unambiguously match up the elements of the headword and the pinyin, the entry will not be processed. That is what happens in the following case, where the proposed pinyin is "eren2" rather than "e-ren2".
 +
 +
 +<code> e人 e人 [[eren2]] /(slang) extroverted person/ (Invalid format!)</code>
 +
 +To specify "erén" (as opposed to, say, "e-rén"), it is necessary to use braces to guide the Editor website in parsing. The following would work:
 +<code>e人 e人 [[{e}ren2]] /(slang) extroverted person/</code>
 +
 +... as would several other forms, including
 +<code>{e}人 {e}人 [[{e}ren2]] /(slang) extroverted person/</code>
 +
 +Here is a link to a webpage where a proposed entry can be tested to see if it can be parsed correctly.
 +
 +"Parse entry" webpage:
 +https://cc-cedict.org/editor/editor.php?handler=ParseEntry
 +
 +To specify "yìrén" as the pinyin for e人, no braces are necessary. The following entry can be parsed, as one can verify at the "Parse entry" webpage. "e" will be matched with "yi4", and 人 will be matched with "ren2".
 +
 +<code>e人 e人 [[yi4ren2]] /(slang) extroverted person/</code>
 +
 +Generally, it is regarded as preferable not to indicate the pronunciation of non-Chinese parts of a headword (such as "e" in "e人"). Instead, they can appear as unparsed elements of the pinyin. For example, "e-ren2" is preferred over "yi4ren2"
 +
  
 ==== Pinyin ==== ==== Pinyin ====
format/syntax_v2.1725053212.txt.gz · Last modified: 2024/08/30 21:26 by richwarm

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki