format:syntax_v2
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
format:syntax_v2 [2024/08/17 22:26] – [Basic format] richwarm | format:syntax_v2 [2024/09/12 23:02] (current) – [CC-CEDICT V2 Syntax] richwarm | ||
---|---|---|---|
Line 3: | Line 3: | ||
//**TODO:** work in progress!// | //**TODO:** work in progress!// | ||
- | Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, " | + | Version 2 (v2) introduces a new syntax for the pinyin of an entry, allowing for the specification of pinyin that follows standard pinyin orthography. In particular, it enables the combination of syllables to form words. For example, in v2, 二次方程 (quadratic equation) can now be written as two words, " |
Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet. | Below are guidelines on what CC-CEDICT entries **should** look like. CC-CEDICT still has many old entries that do not comply with these rules yet. | ||
Line 62: | Line 62: | ||
K人 K人 [K ren2] /(slang) to hit sb; to beat sb/ | K人 K人 [K ren2] /(slang) to hit sb; to beat sb/ | ||
</ | </ | ||
+ | |||
+ | **Below are some notes on how these entries are handled in v2.** | ||
+ | |||
+ | Let's take " | ||
+ | |||
+ | There are several ways one might like to render " | ||
+ | - e-rén | ||
+ | - erén | ||
+ | - yìrén | ||
+ | |||
+ | The Editor website attempts to match the parts of the headword with the parts of the pinyin, and will, if necessary, treat some parts as " | ||
+ | |||
+ | For example, in the following entry, " | ||
+ | < | ||
+ | |||
+ | If the Editor website https:// | ||
+ | |||
+ | |||
+ | < | ||
+ | |||
+ | To specify " | ||
+ | < | ||
+ | |||
+ | ... as would several other forms, including | ||
+ | < | ||
+ | |||
+ | Here is a link to a webpage where a proposed entry can be tested to see if it can be parsed correctly. | ||
+ | |||
+ | "Parse entry" webpage: | ||
+ | https:// | ||
+ | |||
+ | To specify " | ||
+ | |||
+ | < | ||
+ | |||
+ | Generally, it is regarded as preferable not to indicate the pronunciation of non-Chinese parts of a headword (such as " | ||
+ | |||
==== Pinyin ==== | ==== Pinyin ==== | ||
Line 277: | Line 314: | ||
When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] / | When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should have /also written ../ referring to the more common form, e.g. 撐竿跳高 撑竿跳高 [cheng1 gan1 tiao4 gao1] / | ||
+ | |||
+ | |||
+ | **PROPOSED CHANGES** | ||
+ | |||
+ | (Summary: (1) Get rid of "also written", | ||
+ | |||
+ | (THE VARIANT RULES ABOVE CAN BE DELETED IF AND WHEN THESE CHANGES ARE ACCEPTED.) | ||
+ | |||
+ | (Also, the following notes can be tidied up and edited to remove references to " | ||
+ | |||
+ | Regarding "also written..." | ||
+ | |||
+ | According to our wiki, there are two kinds of variants. | ||
+ | https:// | ||
+ | |||
+ | 1) Where the less common form is relatively common (> 20% of the frequency of the more common form). | ||
+ | |||
+ | 2) Where the less common form is much less common (< 20% of the frequency of the more common form) | ||
+ | |||
+ | For the first type, the def of the less common form should look like this (according to the wiki): | ||
+ | < | ||
+ | |||
+ | And for the second type, the def of the less common form should be | ||
+ | < | ||
+ | |||
+ | In practice, what has been happening in recent years is this: | ||
+ | |||
+ | 1. We have been ignoring the "also written ..." syntax, except maybe when we edit existing entries | ||
+ | |||
+ | 2. With variants, | ||
+ | |||
+ | a) if it's a full variant (i.e. exactly the same definition), | ||
+ | |||
+ | b) if it's a partial variant (i.e. only some of the senses of one form apply to the other form) we use | ||
+ | < | ||
+ | |||
+ | Part of the rationale for these changes is this: It's a hassle to check whether entries satisfy the "20% criteria", | ||
+ | |||
+ | Using the Editor website' | ||
+ | |||
+ | One idea that I've had in mind for a while is to clean up all these by | ||
+ | |||
+ | a) rewriting "also written" | ||
+ | |||
+ | b) regularizing the format of the " | ||
+ | |||
+ | |||
===== Romanization of foreign languages ===== | ===== Romanization of foreign languages ===== |
format/syntax_v2.1723933574.txt.gz · Last modified: 2024/08/17 22:26 by richwarm