
Word frequency list based on a 15 billion character corpus: BCC …
2018年6月15日 · The corpus is much larger than the CCL (470 million characters), the CNC (100 million characters), the SUBTLEX-CH (47 million characters) and the LCMC (less than 2 million characters). It seems as if the frequency lists derived from this corpus might be the most reliable frequency lists currently available.
Word frequency list based on a 15 billion character corpus: BCC …
2018年6月15日 · I would read in the BCC corpus frequency list as a dictionary, then Having concatenated all the news/magazine articles as plain text, I would build a dictionary of all the words in the news/magazine articles up to 8 characters long, counting their number of occurrences with the help of the BCC frequency list (which tells us which combinations ...
Integrating BCC Corpus Data into Dictionary - Pleco Software …
2019年1月3日 · I'm honestly a little wary of adding built-in frequency listings because I don't think they're a very good way to learn Chinese; even a really excellent corpus will probably be several years out of date for slang vocabulary, so a term that comes up as uncommon may actually be quite common now (or vice versa) - people are constantly repurposing old words - plus I don't …
Bigrams sorted by frequency with pinyin & English?
2023年6月21日 · The Beijing Language and Culture University created a balanced corpus of 15 billion characters. It’s based on news (人民日报 1946-2018,人民日报海外版 2000-2018), literature (books by 472 authors, including a significant portion of non-Chinese writers), non-fiction books, blog and weibo entries as well as...
Flashcards for TOCFL (2023), CCCC, TBCL | Pleco Software Forums
2023年11月7日 · I've parsed out vocabulary from these taiwanese tests and converted to flashcards in pleco's format. Useful e.g. for seeing term levels, intended part of speech and sometimes definitions/examples. TOCFL vocab was updated some couple years ago and I haven't yet seen a processed version of the...
Integrating BCC Corpus Data into Dictionary
2019年1月3日 · Thank you very much for your detailed explanation.! Yes, that makes sense. Also, by importing the card as a user dictionary you gain additional benefits without losing anything!, So if my understanding is correct it seems there are no significant downsides:) You're welcome! Yeah, it's true, for...
audio recording corpus | Pleco Software Forums
2010年2月5日 · Hey Mike, I'm a big user of vocab lists and I'm about 1.5 months away from finishing the HSK4 list. Recently I've been studying some colloquial stuff and...
Common Idioms; A Collection by Grade [HSK / old HSK / 中考 / 高 …
2019年12月27日 · The corpus is much larger than the CCL (470 million characters), the CNC (100 million characters), the SUBTLEX-CH (47 million characters) and the LCMC (less than 2 million characters). It seems as if the frequency lists derived from this corpus might be the most reliable frequency lists currently available.
Media-related vocabulary gathering project - Pleco Software Forums
2020年1月15日 · With a small corpus of 650 articles from People's Daily, downloaded using a Python script, I hope to start providing a more modern frequency list of media-related vocabulary. The frequency list has the following features: It uses all sections of the 人民日报 / People's Daily newspaper, including the sports section.
Most commonly used words / characters into flashcards...
2021年3月19日 · The Beijing Language and Culture University created a balanced corpus of 15 billion characters. It’s based on news (人民日报 1946-2018,人民日报海外版 2000-2018), literature (books by 472 authors, including a significant portion of non-Chinese writers), non-fiction books, blog and weibo entries as well as...