Login

View change log entry

Back

Navigation:  ◀ 81334  81336 ▶ 

Change log entry 81335
Processed by: richwarm (2023-12-12 20:59:07 GMT)
Comment: << review queue entry 75244 - submitted by 'cws' >>
v2

Do v2 multi-word vocabularies and idiom vocabularies can have spaces and d on them now?

Where can we see the guidelines on version 2 format?
---------------------------------------

Editor: We need to document v2 format on the wiki.
https://cc-cedict.org/wiki/format:syntax

But currently, the only v2 feature that is documented there is a note on the use of semicolons in definitions:
https://cc-cedict.org/wiki/format:syntax#semicolons

We haven't yet written anything in the wiki about v2 pinyin format.

* * *

Yes, in v2, you can have spaces and hyphens.

We designed v2 to be backwards compatible. Starting with a v2 entry, you just remove all spaces and hyphens, then insert a space after each numeral (unless it's followed by "]]").

(It's a bit more complicated in the case of headwords like "X光", but I'll keep it simple for the moment.)

We will continue to publish CC-CEDICT in v1 format for apps that haven't been modified to handle v2 format.

* * *

A relatively small number of headwords contain non-hanzi characters. In v2, we handle these with braces { }. Here are some examples:
{BP}機 {BP}机 [[{BP}-ji1]] /(loanword) beeper; pager/
打{call} 打{call} [da3 {call}] /(slang) to cheer sb on/

In v1, each character in the headword was supposed to have a corresponding pinyin syllable. When it came to a term like "打call", we needed five pinyin syllables, even though the headword is pronounced as two syllables, so we made it [da3 c a l l]. In v2, strings surrounded by braces are treated as a single pseudo-character (in the headword) or a single pinyin pseudo-syllable (in the pinyin). In the above examples, both headwords have two (pseudo-)characters and two (pseudo-)syllables. This format enables us to indicate that the pinyin for BP机 should be written as BP-jī (not "B P jī", as v1 format would suggest) and 打call as "dǎ call" (not "dǎ c a l l").

* * *

On our download page, we now have three versions of CC-CEDICT available:
- standard (version 1, where any v2 entries are converted back to v1)
- mixed (some unconverted entries still in v1 + others that have been manually converted to v2 by an editor)
- version 2 (where any v1 entries are converted simplistically by a script* to v2)
https://cc-cedict.org/editor/editor.php?handler=Download

* The script removes all spaces in the pinyin, then reinserts a space before each capital letter other than the first letter of the pinyin.

We have only just started manually converting entries to v2, so the mixed and v2 versions of CC-CEDICT don't offer much added value yet.

Existing apps can continue to use v1 without recoding. Or they can upgrade to use v2 in the future (either when all entries have been converted to v2 by editors, or when a substantial proportion of them have been converted).
Diff:
- 電腦病毒 电脑病毒 [dian4 nao3 bing4 du2] /computer virus/
# + 電腦病毒 电脑病毒 [[[dian4nao3 bing4du2]]] /computer virus/
+ 電腦病毒 电脑病毒 [[dian4nao3 bing4du2]] /computer virus/
By MDBG 2024
Privacy and cookies
Help wanted: the CC-CEDICT project is looking for new volunteer editors!