syntax_v2
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| syntax_v2 [2026/04/25 13:42] – 兒|儿 kbaiko | syntax_v2 [2026/05/16 09:39] (current) – Handling numbers with multiple digits kbaiko | ||
|---|---|---|---|
| Line 226: | Line 226: | ||
| //(the +"" | //(the +"" | ||
| - | ===== Variants ===== | ||
| - | Many characters have variants, sometimes more than one, sometimes with identical meaning or quite different meanings. Some choice | + | ===== Romanization |
| - | You can get rough usage frequency information by searching the alternative word forms in Google. Please | + | When transcribing foreign words in definitions, |
| - | < | + | * Japanese: [[http:// |
| + | * Korean: [[http:// | ||
| - | Additionally you can use Google' | + | If an alternative romanization method is more popular for a certain word, that version |
| - | 789 Chinese (Traditional) pages for +" | + | |
| - | 17,700 Chinese (Simplified) pages for +" | + | |
| - | 1,750 Chinese (Traditional) pages for +" | + | |
| - | 66,900 Chinese (Simplified) pages for +" | + | |
| - | It often happens that Google tells you that +" | + | ===== Non-Chinese characters ===== |
| - | When there are alternative forms of the same expression, and the less common form is at most 5 times less common, the less common entry should | + | On occasion |
| + | < | ||
| + | # English letters | ||
| + | ky ky [[ky]] /(slang) socially tone-deaf; unable to read the room (from Japanese KY, acronym of 空気が読めない "kuuki ga yomenai" | ||
| + | coser coser [[coser]] /cosplayer/ | ||
| - | **PROPOSED CHANGES** | + | # Mix of English and Chinese |
| + | e人 e人 [[e-ren2]] /(slang) extroverted person/ | ||
| + | 勿cue 勿cue [[wu4-cue]] /(Internet slang) don't call on me; don't drag me in/ | ||
| - | (Summary: (1) Get rid of "also written", | + | # Numbers |
| - | + | 3D打印 3D打印 [[san1-D da3yin4]] /to 3D print; 3D printing/ | |
| - | (THE VARIANT RULES ABOVE CAN BE DELETED IF AND WHEN THESE CHANGES ARE ACCEPTED.) | + | 95後 95后 [[jiu3wu3hou4]] /people born between 1995-01-01 |
| + | 996 996 [[jiu3jiu3liu4]] /9am–9pm, six days a week (work schedule)/ | ||
| + | </ | ||
| - | (Also, the following notes can be tidied up and edited to remove references to " | + | As a general rule of thumb: |
| + | - When writing the Hanzi fields, non-Chinese characters should stay the same. | ||
| + | - When writing the pinyin, for English letters use the same letters in the pinyin (ky -> ky), but for numbers write out the pinyin for the corresponding Chinese character (9 -> jiu3) | ||
| - | Regarding "also written..." | ||
| - | According to our wiki, there are two kinds of variants. | + | ==== Technical details, and the use of {} ==== |
| - | https:// | + | |
| - | 1) Where the less common form is relatively common (> 20% of the frequency of the more common form). | + | When parsing the traditional and simplified fields, Hanzi and numbers are treated as individual sections, while consecutive English letters are grouped together into a single section. For example a hypothetical headword " |
| - | 2) Where the less common form is much less common (< 20% of the frequency | + | The pinyin |
| - | For the first type, the def of the less common form should look like this (according | + | It is a requirement that the number of parsed sections in the Hanzi matches |
| - | < | + | |
| - | And for the second type, the def of the less common form should be | + | < |
| - | < | + | 兡 兡 [[bai3ke4]] |
| + | </ | ||
| - | In practice, what has been happening in recent years is this: | + | where a single character corresponds to two syllables. |
| - | 1. We have been ignoring the "also written | + | < |
| + | 兡 兡 [[{bai3ke4}]] /.../ | ||
| + | </ | ||
| - | 2. With variants, | + | which indicates " |
| - | a) if it' | + | Another problem arises for entries with a number with multiple digits. Consider |
| - | b) if it's a partial variant (i.e. only some of the senses of one form apply to the other form) we use | + | < |
| - | < | + | 11 11 [[shi2yi1]] |
| + | </ | ||
| - | Part of the rationale for these changes | + | which implies that the first 1 is pronounced |
| - | Using the Editor website' | + | < |
| + | 21 21 [[er4shi2yi1]] /twenty one/ | ||
| + | </ | ||
| - | One idea that I've had in mind for a while is to clean up all these by | + | which poses a different problem - we have two Hanzi sections but three pinyin sections due to the extra "shi2" |
| - | + | ||
| - | a) rewriting "also written" | + | |
| - | + | ||
| - | b) regularizing the format of the "variant of" | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | ===== Romanization of foreign languages ===== | + | |
| - | + | ||
| - | When transcribing foreign words in definitions, please use the following romanization methods: | + | |
| - | * Japanese: [[http:// | + | |
| - | * Korean: [[http:// | + | |
| - | + | ||
| - | If an alternative romanization method is more popular for a certain word, that version can be added as an additional translation. | + | |
| - | + | ||
| - | ===== Non-Chinese entries ===== | + | |
| - | + | ||
| - | There are a very small number | + | |
| < | < | ||
| - | % % [pa1] /percent (Tw)/ | + | {21}三體綜合症 {21}三体综合症 |
| - | 3C 3C [san1 C] /computers, communications, | + | |
| - | 421 421 [si4 er4 yi1] /four grandparents, | + | |
| - | K人 K人 [K ren2] /(slang) to hit sb; to beat sb/ | + | |
| </ | </ | ||
| - | **Below are some notes on how these entries are handled in v2.** | + | Note this is different from the 996 example above, which is treated as 3 digits "nine nine six" and parses without {}'s, not the number "nine hundred ninety six", which would need {}'s. |
| - | Let's take " | + | To check whether |
| - | + | ||
| - | There are several ways one might like to render " | + | |
| - | - e-rén | + | |
| - | - erén | + | |
| - | - yìrén | + | |
| - | + | ||
| - | The Editor website attempts to match the parts of the headword with the parts of the pinyin, and will, if necessary, treat some parts as " | + | |
| - | + | ||
| - | For example, in the following entry, " | + | |
| - | < | + | |
| - | + | ||
| - | If the Editor website https:// | + | |
| - | + | ||
| - | + | ||
| - | < | + | |
| - | + | ||
| - | To specify " | + | |
| - | < | + | |
| - | + | ||
| - | ... as would several other forms, including | + | |
| - | < | + | |
| - | + | ||
| - | Here is a link to a webpage where a proposed entry can be tested to see if it can be parsed correctly. | + | |
| - | + | ||
| - | "Parse entry" webpage: | + | |
| https:// | https:// | ||
| - | |||
| - | To specify " | ||
| - | |||
| - | < | ||
| - | |||
| - | Generally, it is regarded as preferable not to indicate the pronunciation of non-Chinese parts of a headword (such as " | ||
| - | |||
syntax_v2.1777124558.txt.gz · Last modified: by kbaiko
