This is an old revision of the document!
As of December 2006, the CHDICT project is work in progress. Its objectives are:
XML. With no legacy data to maintain, CHDICT will be stored in XML from the very beginning. Besides preventing codepage issues, this is also expected to:
Annotation and structure. A few key features of CHDICT's representation of entries:
Editing. The only way to edit entries will be through a web-based form that mirrors CHDICT's entry structure. Besides validating and enforcing the well-formedness of data, this form will also offer convenience functions such as hints for measure words, guessing traditional/simplified/pinyin from partially specified data etc.
Version control. Two user roles, contributor and editor, will be distinguished, and all entries will be marked until approval by an editor. The complete version history of all entries will be stored in a database, and as approved entries accumulate, the master resource will be published periodically on the website as a single XML file.
December 29, 2006 – 5800 entries have been generated, excluding proper nouns. Work on the website, version control and dictionary engine is in progress.
I expect the website to go live in the first half of 2006.
You can download the current working draft of CHDICT's data format from the following link: 001-CHDICTFormat-v1-EN.pdf. I hope it will also contribute to the discussion about CEDICT's future format: all comments are welcome.
Many of my decisions have been based on HanDeDict (e.g., fields of application). I believe a common convention for parts of speech, fields, styles and regions could benefit all of our projects.
I would also like to suggest creating a shared resource of Chinese example sentences and their translations in English, German, French and Hungarian.