Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
M4__Multidomain__Multimodel_and_Multilingual_Machine_Generated_Text_Detection.pdf		M4__Multidomain__Multimodel_and_Multilingual_Machine_Generated_Text_Detection.pdf
README.md		README.md

Repository files navigation

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated text makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human-written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse.

We

Data

Here are current statistics about the M4 dataset. It will be further extended in SemEval 2014 shared task 8 with surprising generators, domains and languages.

The M4 dataset is described the following arXiv paper:

@article{wang2023m4,
      title={{M4}: Multi-generator, Multi-domain, and Multi-lingual
                   Black-Box Machine-Generated Text Detection}, 
      author={Yuxia Wang and
              Jonibek Mansurov and
              Petar Ivanov and
              Jinyan Su and
              Artem Shelmanov and
              Akim Tsvigun and
              Chenxi Whitehouse and
              Osama Mohammed Afzal and
              Tarek Mahmoud and
              Alham Fikri Aji and
              Preslav Nakov},
      year={2023},
      journal={arXiv:2305.14902},
      primaryClass={cs.CL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Data

About

Releases

Packages

Contributors 3

mbzuai-nlp/M4

Folders and files

Latest commit

History

Repository files navigation

M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages