Change log entry 89219 | |
---|---|
Processed by: | richwarm (2025-04-02 23:18:12 UTC) |
Comment: |
<< review queue entry 81760 - submitted by 'agedits' >> Separated the glosses with semicolons instead of slashes. With the large amount of these entries, where glosses are separated by slashes instead of semicolons, it might make sense to process them with a pre-trained AI model. For example with Word2Vec (using spaCy & Gensim), best for accuracy, 1. Convert words into vector representations (embeddings). 2. Cluster words that are "close" in meaning together (use ;) 3. If words are "far apart" in meaning, separate them into different senses (/) Or with a simpler approach, Thesaurus & WordNet Matching (like LNTK WordNet or OpenThesaurus). --------------------------- Editor: When we review an old entry (pre-2022 and especially pre-2010), we are not just replacing slashes with semicolons to group similar senses. We are checking the entire entry, including - pinyin segmentation, and - the validity of the definition For example, we had an entry 搜證 搜证 [sou1 zheng4] /search warrant/to look for evidence/ It has now been checked and replaced with 蒐證 搜证 [[sou1zheng4]] /to gather evidence (in a criminal case)/ The double square brackets [[...]] indicate that the entry has (a) been converted to v2 format, and at the same time, (b) checked in its entirety, including the definition. In the case of this entry, - the traditional character form was changed - the pinyin was updated to v2 - one of the senses was rewritten to be more precise, and - the other sense was deleted as incorrect Just grouping the existing glosses with slashes and semicolons is not enough, because many of our older definitions need to be rewritten. Both of the following definitions would have been wrong: /search warrant/to look for evidence/ /search warrant; to look for evidence/ |
Diff: |
- 一路來 一路来 [yi1 lu4 lai2] /all the way/all along/since the start/ + 一路來 一路来 [[yi1lu4lai2]] /all the way; all along; since the start/ |