Hamshahri Corpus
The Hamshahri Corpus (Persian: پیکره همشهری) is a sizable Persian corpus based on the Iranian newspaper Hamshahri, one of the first online Persian-language newspapers in Iran. It was initially collected and compiled by Ehsan Darrudi at DBRG Group of University of Tehran. Later, a team headed by Abolfazl AleAhmad built on this corpus and created the first Persian text collection suitable for information retrieval evaluation tasks.
This corpus was created by crawling the online news articles from the Hamshahri's website and processing the HTML pages to create a standard text corpus for modern information retrieval experiments.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.