The Fifth Arabic Natural Language Processing Workshop

(WANLP 2020)

Co-located Online with COLING'2020, 12 Dec. 2020

Public Panel Zoom Link (Open to all)

Topic: "The journey of Women in Arabic NLP: difficulties, opportunities and future prospects."

Date: Dec 12, 2020

Time: 20:00-21:00 CET time

Zoom Link: https://us02web.zoom.us/j/87601833993?pwd=Vk5Ua0FJQXNZWnJ6MmtsRytvUE1JZz09

Meeting ID: 876 0183 3993

Passcode: 171569

Important Dates

  • Sept 10, 2020 (23:59 AOE time zone): Workshop Paper Due Date (Deadline Extended)

  • Oct 1, 2020: Notification of Acceptance

  • Oct 20, 2020: Camera-ready Papers Due

  • Dec 12: Workshop Date

Workshop Program (Updated)

WANLP 2020_ Program-new.pdf

Workshop Description

Arabic is a challenging language for the field of computational linguistics. This is due to many factors including its complex and rich morphology, its high degree of ambiguity as well as the presence of a number of dialects that vary quite widely. Arabic is also a language with important geopolitical connections. It is spoken by over 400 million people in countries with varying degrees of prosperity and stability. It is the primary language of the latest world refugee problem affecting the Middle East and Europe. The opportunities that are made possible by working on this language and its dialects cannot be underestimated in their consequence on the Arab World, the Mediterranean Region and the rest of the World.

There has been a lot of progress in the last 20 years in the area of Arabic Natural Language Processing (NLP). Many Arabic NLP (or Arabic NLP-related) workshops and conferences have taken place, both in the Arab World and in association with international conferences. Examples include the following:

    • The First, Second, Third , and Fourth Arabic Natural Language Processing Workshop at EMNLP 2014, ACL 2015, EACL 2017, and ACL 2019 respectively.

    • The First, Second, and Third Workshops on Arabic Corpora and Processing Tools at LREC 2014, LREC 2016, and LREC 2018, respectively.

    • The conference on Arabic Language Resources and Tools (MEDAR-2009, NEMLAR-2004).

    • The workshop on Computational Approaches to Semitic Languages (LREC 2010, EACL 2009, ACL 2007, ACL 2005, ACL 2002, ACL 1998).

    • The workshop on Computational Approaches to Arabic Script-based Languages (MTSummit XII 2009, LSA 2007, COLING 2004).

    • The International Symposium on Computer and Arabic Language (ISCAL 2009, ISCAL 2007)

This workshop follows in the footsteps of these efforts to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic NLP.

We invite submissions on topics that include, but are not limited to, the following:

    • Basic core technologies: morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc.

    • Applications: machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media, etc.

    • Resources: dictionaries, annotated data, corpus, etc.

Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers such as Semitic languages or languages using Arabic script are welcome. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work.

Shared Task

Associated with the workshop will be a shared task on Arabic dialect identification. This shared task targets province-level dialects, and as such will be the first to focus on naturally-occurring fine-grained dialect at the sub-country level.

Shared Task Webpage: https://sites.google.com/view/nadi-shared-task


Invited Speaker

Dr. Preslav Nakov from the Arabic Language Technologies Group at the Qatar Computing Research Institute (Doha, Qatar) has agreed to be the keynote speaker at the workshop. He will be talking of "The Tanbih Mega-Project at QCRI: Fighting the Fake News, Promoting Media Literacy, and Flattening the Curve of the COVID-19 Infodemic"

Workshop Organizers

General Chair:

      • Imed Zitouni, Google, USA. Email: imed.zitouni AT gmail.com

Program Chairs:

      • Muhammad Abdul-Mageed, UBC, Canada. Email: muhammad.mageed AT ubc.ca

      • Houda Bouamor, Carnegie Mellon University in Qatar. Email: hbouamor AT qatar.cmu.edu

      • Fethi Bougares, University of Le Mans, France. Email: fethi.bougares AT univ-lemans.fr

      • Mahmoud El-Haj, Lancaster University, England. Email: m.el-haj AT lancaster.ac.uk

Publication Chair:

      • Nadi Tomeh, LIPN, Université Paris 13, Sorbonne Paris Cité. Email: tomeh AT lipn.fr

Publicity Chair:

      • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar. Email: wzaghouani AT hbku.edu.qa

Ex-General Chair / Advisor:

      • Wassim El-Hajj, American University of Beirut, Lebanon. Email: we07 AT aub.edu.lb

Advisory Committee:

      • Muhammad Abdul-Mageed, UBC, Canada. Email: muhammad.mageed@ubc.ca

      • Ahmed Ali, Qatar Computing Research Institute, Qatar. Email: amali@qf.org.qa

      • Hend Alkhalifa, King Saud University, Saudi Arabia. Email: hend.alkhalifa AT gmail.com

      • Houda Bouamor, Carnegie Mellon University in Qatar. Email: hbouamor AT qatar.cmu.edu

      • Fethi Bougares, Le Mans University, France. Email: Fethi.bougares AT gmail.com

      • Khalid Choukri, ELDA, European Language Resource Association, France. Email: choukri AT elda.org

      • Kareem Darwish, Qatar Computing Research Institute, Qatar. Email: kdarwish AT hbku.edu.qa

      • Mona Diab, George Washington University, USA. Email: mtdiab AT gmail.com

      • Mahmoud El-Haj, Lancaster University, UK. Email: m.el-haj AT lancaster.ac.uk

      • Samhaa El-Beltagy, Nile University, Egypt. Email: samhaaelbeltagy AT gmail.com

      • Wassim El-Hajj, American University of Beirut, Lebanon. Email: we07 AT aub.edu.lb

      • Nizar Habash, New York University Abu Dhabi, UAE. Email: nizar.habash AT nyu.edu

      • Lamia Hadrich Belguith, University of Sfax, Tunisia. Email: lamia.belguith AT gmail.com

      • Hazem Hajj, American University of Beirut, Lebanon. Email: hh63 AT aub.edu.lb

      • Walid Magdy, University of Edinburgh, Scotland. Email: wmagdy AT inf.ed.ac.uk

      • Khaled Shaalan, The British University in Dubai, UAE. Email: khaled.shaalan AT buid.ac.ae

      • Kamel Smaili, University of Lorraine, France. Email: kamel.smaili AT loria.fr

      • Nadi Tomeh, University Paris 13, France. Email: tomeh AT lipn.fr

      • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar. Email: wajdiz AT gmail.com

      • Imed Zitouni, Google, USA. Email: imed.zitouni AT gmail.com

Paper Submission Instructions

Paper Length: Submissions are expected to be up to 9 pages long plus any number of pages for references.

Submission Format: Download the MS Word and LaTeX templates here: https://coling2020.org/coling2020.zip

Submission Website: https://www.softconf.com/coling2020/WANLP2020

Blind Reviewing Policy: The workshop follows a blind reviewing policy. The authors should omit their names and affiliations from the paper and avoid self-references that reveal their identity. Papers that do not conform to these requirements will be rejected without review.

Multiple Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate this at submission time. Authors must inform organizers immediately once a paper is to be withdrawn from the workshop for any reason. Attempting to publish the same paper or with a major overlap (50%) may lead to rejection of the paper even after an acceptance notification have gone out.

Anonymity and Supplementary Material: As the reviewing will be blind, papers must not include authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ..." Papers that do not conform to these requirements will be rejected without review.

Papers should not refer, for further detail, to documents that are not available to the reviewers. For example, do not omit or redact important citation information to preserve anonymity. Instead, use third person or named reference to this work, as described above (“Smith showed” rather than “we showed”).

Papers may be accompanied by a resource (software and/or data) described in the paper. Papers that are submitted with accompanying software/data may receive additional credit toward the overall evaluation, and the potential impact of the software and data will be taken into account when making the acceptance/rejection decisions.

WANLP 2020 also encourages the submission of supplementary material to report preprocessing decisions, model parameters, and other details necessary for the replication of the experiments reported in the paper. Seemingly small preprocessing decisions can sometimes make a large difference in performance, so it is crucial to record such decisions to precisely characterize state-of-the-art methods.

Nonetheless, supplementary material should be supplementary (rather than central) to the paper. It may include explanations or details of proofs or derivations that do not fit into the paper, lists of features or feature templates, sample inputs and outputs for a system, pseudo-code or source code, and data. The paper should not rely on the supplementary material: while the paper may refer to and cite the supplementary material and the supplementary material will be available to reviewers, they will not be asked to review or even download the supplementary material. Authors should refer to the contents of the supplementary material in the paper submission, so that reviewers interested in these supplementary details will know where to look.

Note: The supplementary material does not count towards page limit and should not be included in paper, but should be submitted separately using the appropriate field on the submission website

Program Committee Members

The following is the list of PC members, all of whom participated in the review process of the WANLP 2019. A large percentage of them confirmed their willingness to review papers for WANLP 2020.

  • Mourad Abbas, CRSTDLA, Algeria

  • Ahmed Abdelali, Qatar Computing Research Institute, HBKU, Qatar

  • Muhammad Abdul-Mageed, The University of British Columbia, Canada

  • Bayan Abu Shawar, Al Ain University, UAE

  • Wafia Adouane, University of Gothenburg, Sweden

  • Haithem Afli, Cork Institute of Technology, Ireland

  • Hussein Al-Natsheh, Mawdoo3 Limited, Jordan

  • Almoataz Al-Said, Cairo University, Egypt

  • Bashar Alhafni, New York University Abu Dhabi, UAE

  • Ahmed Ali, Qatar Computing Research Institute, HBKU, Qatar

  • Hend Alkhalifa, King Saud University, KSA

  • Chafik Aloulou, Univeristé de Sfax, Tunisia

  • Areeb Alowisheq, Imam University, KSA

  • Mohammed Attia, George Washington University

  • Gilbert Badaro, American University of Beirut, Lebanon

  • Riadh Belkebir, New York University Abu Dhabi, UAE

  • Houda Bouamor, Carnegie Mellon University in Qatar

  • Karim Bouzoubaa, Mohammad V University, Morocco

  • Shammur Chowdhury, Qatar Computing Research Institute, HBKU, Qatar

  • Kareem Darwish, Qatar Computing Research Institute, HBKU, Qatar

  • Mahmoud El Haj, Lancaster University, UK

  • Wassim El-Hajj, American University of Beirut, Lebanon

  • Shady Elbassuoni, American University of Beirut, Lebanon

  • Mariem Ellouze, University of Sfax, Tunisia

  • Tamer Elsayed, Qatar University, Qatar

  • Sahar Ghannay, LIUM Laboratory, France

  • Nada Ghneim, Higher Institute for Applied Sciences and Technology, Syria

  • Nizar Habash, New York University Abu Dhabi, UAE

  • Bassam Haddad, University of Petra, Jordan

  • Lamia Hadrich Belguith, University of Sfax, Tunisia

  • Hazem Hajj, American University of Beirut, Lebanon

  • Salima Harrat, École Normale Supérieure (Bouzaréah), Algeria

  • Maram Hasanain, Qatar University, Qatar

  • Go Inoue, New York University Abu Dhabi, UAE

  • Mustafa Jarrar, Bir Zeit University, Palestine

  • Ganesh Jawahar, The University of British Columbia, Canada

  • Salam Khalifa , New York University Abu Dhabi, UAE

  • Walid Magdy, University of Edinburgh, Scotland

  • Azzeddine Mazroui, University Mohamed I, Morocco

  • Seif Mechti, University of Sfax, Tunisia

  • Salima Medhaffar, Le Mans University, France

  • Karima Meftouh, Badji Mokhtar University, Algeria

  • Hamdy Mubarak, Qatar Computing Research Institute, HBKU, Qatar

  • El Moatez Billah Nagoudi, The University of British Columbia, Canada

  • Preslav Nakov, Qatar Computing Research Institute, HBKU, Qatar

  • Alexis Nasr, University of Marseille, France

  • Joshi Praveen, Cork Institute of Technology, Ireland

  • Younes Samih, Heinrich Heine Universität Düsseldorf, Germany

  • Khaled Shaalan, The British University in Dubai, UAE

  • Khaled Shaban, Qatar University, Qatar

  • Peter Sullivan Sullivan, The University of British Columbia, Canada

  • Reem Suwaileh, Qatar University, Qatar

  • Nadi Tomeh, University Paris 13, France

  • Omar Trigui, University of Sousse, Tunisia

  • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar

  • Nasser Zalmout, Amazon Inc., USA

  • Taha Zerrouki, University of Bouira, Algeria

  • Chiyu Zhang, The University of British Columbia, Canada

Shared Task: Nuanced Arabic Dialect Identification (NADI)

Introduction: Arabic has a widely varying collection of dialects. Many of these dialects remain under-studied due to rarity of resources. The goal of the shared task is to alleviate this bottleneck in the context of fine-grained Arabic dialect identification. Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. Previous work on Arabic dialect identification has focused on coarse-grained regional varieties such as Gulf or Levantine (e.g., Zaidan and Callison-Burch, 2013; Elfardy and Diab, 2013) or country-level varieties such as the MADAR shared task in WANLP 2019 (Bouamor, Hassan, and Habash, 2019). The MADAR shared task also involved city-level classification on human translated data. This shared task targets province-level dialects, and as such will be the first to focus on naturally-occurring fine-grained dialect at the sub-country level. The data covers a total of 100 provinces from all 22 Arab countries and come from the Twitter domain. Evaluation and task set up follows the MADAR 2019 shared task. The subtasks involved include:

    • Subtask 1: Country-level dialect identification: A total of 22,000 tweets, covering all 22 Arab countries. This is a new dataset created for this shared task.

    • Subtask 2: Province-level dialect identification. A total of 22,000 tweets, covering 100 provinces from all 22 Arab countries. This is the same dataset as in Subtask 1, but with province labels.

Unlabeled data: Participants will also be provided with an additional 10M unlabeled tweets that can be used in developing their systems for either or both of the tasks.

Metrics: The evaluation metrics will include precision/recall/f-score/accuracy. Macro Averaged F-score will be the official metric.

Participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. All teams are required to report on the development and test set in their writeups.

The shared task will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task.

Organizers: Muhammad Abdul-Mageed, Chiyu Zhang (The University of British Columbia, Canada), Nizar Habash (New York University Abu Dhabi) , and Houda Bouamor (Carnegie Mellon University, Qatar).