syntax_v2
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| syntax_v2 [2026/05/03 12:12] – Archive "Variants" and "Non-Chinese entries" kbaiko | syntax_v2 [2026/05/16 09:39] (current) – Handling numbers with multiple digits kbaiko | ||
|---|---|---|---|
| Line 235: | Line 235: | ||
| If an alternative romanization method is more popular for a certain word, that version can be added as an additional translation. | If an alternative romanization method is more popular for a certain word, that version can be added as an additional translation. | ||
| + | ===== Non-Chinese characters ===== | ||
| + | |||
| + | On occasion the Chinese language uses English letters or numerals to write a word. For example, we have | ||
| + | |||
| + | < | ||
| + | # English letters | ||
| + | ky ky [[ky]] /(slang) socially tone-deaf; unable to read the room (from Japanese KY, acronym of 空気が読めない "kuuki ga yomenai" | ||
| + | coser coser [[coser]] /cosplayer/ | ||
| + | |||
| + | # Mix of English and Chinese | ||
| + | e人 e人 [[e-ren2]] /(slang) extroverted person/ | ||
| + | 勿cue 勿cue [[wu4-cue]] /(Internet slang) don't call on me; don't drag me in/ | ||
| + | |||
| + | # Numbers | ||
| + | 3D打印 3D打印 [[san1-D da3yin4]] /to 3D print; 3D printing/ | ||
| + | 95後 95后 [[jiu3wu3hou4]] /people born between 1995-01-01 and 1999-12-31/ | ||
| + | 996 996 [[jiu3jiu3liu4]] /9am–9pm, six days a week (work schedule)/ | ||
| + | </ | ||
| + | |||
| + | As a general rule of thumb: | ||
| + | - When writing the Hanzi fields, non-Chinese characters should stay the same. | ||
| + | - When writing the pinyin, for English letters use the same letters in the pinyin (ky -> ky), but for numbers write out the pinyin for the corresponding Chinese character (9 -> jiu3) | ||
| + | |||
| + | |||
| + | ==== Technical details, and the use of {} ==== | ||
| + | |||
| + | When parsing the traditional and simplified fields, Hanzi and numbers are treated as individual sections, while consecutive English letters are grouped together into a single section. For example a hypothetical headword " | ||
| + | |||
| + | The pinyin is first split by spaces and punctuation, | ||
| + | |||
| + | It is a requirement that the number of parsed sections in the Hanzi matches the number of parsed sections in the pinyin. For the vast majority of entries, this does not pose a problem. Almost all Chinese characters are one syllable in length, and due to how the parsing logic works, numbers and English letters will be parsed correctly as long as the pinyin is segmented correctly. Problems arise in rare situations such as | ||
| + | |||
| + | < | ||
| + | 兡 兡 [[bai3ke4]] /.../ | ||
| + | </ | ||
| + | |||
| + | where a single character corresponds to two syllables. In these cases, {}'s may be used to manually group a section, so we can write | ||
| + | |||
| + | < | ||
| + | 兡 兡 [[{bai3ke4}]] /.../ | ||
| + | </ | ||
| + | |||
| + | which indicates " | ||
| + | |||
| + | Another problem arises for entries with a number with multiple digits. Consider a hypothetical entry such as | ||
| + | |||
| + | < | ||
| + | 11 11 [[shi2yi1]] /eleven/ | ||
| + | </ | ||
| + | |||
| + | which implies that the first 1 is pronounced " | ||
| + | |||
| + | < | ||
| + | 21 21 [[er4shi2yi1]] /twenty one/ | ||
| + | </ | ||
| + | |||
| + | which poses a different problem - we have two Hanzi sections but three pinyin sections due to the extra " | ||
| + | |||
| + | < | ||
| + | {21}三體綜合症 {21}三体综合症 [[{21} san1ti3 zong1he2zheng4]] /trisomy; Down's syndrome/ | ||
| + | </ | ||
| + | |||
| + | Note this is different from the 996 example above, which is treated as 3 digits "nine nine six" and parses without {}'s, not the number "nine hundred ninety six", which would need {}'s. | ||
| + | |||
| + | To check whether an entry will be parsed correctly, you can use this tool: | ||
| + | https:// | ||
syntax_v2.1777810336.txt.gz · Last modified: by kbaiko
