Researching the opportunities and limitations of using textual web data for linguistic analysis, language modelling, and knowledge representation.
SIGWAC is the Special Interest Group of the Association for Computational Linguistics on Web as Corpus. We research the opportunities and limitations of using textual web data across linguistic and computational domains.
Given ever-growing data needs of Large Language Models, Web Corpora have taken a central place in Natural Language Processing, Computational Linguistics, and Machine Learning. SIGWAC has organised its topics of interest across technical, legal, and societal dimensions to reflect this.
We promote interest in the web as both a source of linguistic data and an object of study in its own right, providing members of the ACL with a means to exchange research developments and news.
Constitution of ACL SIGWACObjectives
Sign up to the mailing list to receive news, announcements, and calls for participation from the SIGWAC community.
Subscribe to the mailing listSIGWAC's research spans three interconnected dimensions as web corpora become central to modern NLP and ML.
| 2020 | WAC-XIICancelled | LREC 2020, Marseille, 16 May. Proceedings published. |
| 2017 | WAC-XI | Corpus Linguistics 2017, Birmingham, 24–27 July |
| 2016 | WAC-X | ACL 2016, Berlin, 12 August |
| 2015 | WAC@eLex | eLex, Herstmonceux Castle, UK, 10 August |
| 2014 | WAC9 | EACL 2014, Gothenburg, 26–27 April |
| 2013 | WAC8 | Corpus Linguistics 2013, Lancaster, 22 July |
| 2012 | WAC7 | WWW12, Lyon, 17 April |
| 2011 | BUCC | ACL 2011, Portland, Oregon, 24 June |
| 2010 | WAC6 | NAACL-HLT, Los Angeles, 5 June |
| 2009 | WAC5 | SPLN, San Sebastián, Basque Country, 7 September |
| 2008 | WAC4 | LREC, Marrakech, 1 June |
| 2007 | WAC3 | Louvain-la-Neuve, Belgium, 15–16 September |
| 2006 | WAC2 | EACL, Trento, Italy,April |
| 2005 | WAC1 | Corpus Linguistics, Birmingham,July |