G Kremer and K Erk and S Pado and S Thater. What Substitutes Tell Us - Analysis of an "All-Words" Lexical Substitution Corpus. Proceedings of EACL 2014, Gothenburg.

Note: Data available here.

We present the first large-scale English "all-words lexical substitution" corpus. The size of the corpus provides a rich resource for investigations into word meaning. We investigate the nature of lexical substitute sets, comparing them to WordNet synsets. We find them to be consistent with, but more fine-grained than, synsets. We also identify significant differences to results for paraphrase ranking in context reported for the SEMEVAL lexical substitution data. This highlights the influence of corpus construction approaches on evaluation results.

  author    = {Kremer, Gerhard  and  Erk, Katrin and  
               Pad\'{o}, Sebastian  and  Thater, Stefan},
  title     = {What Substitutes Tell Us - Analysis of an 
               "All-Words" Lexical Substitution Corpus},
  booktitle = {Proceedings of the 14th Conference of the European 
               Chapter of the Association for Computational Linguistics},
  year      = {2014},
  address   = {Gothenburg, Sweden},
  pages     = {540--549},