Automatic Assessment of Wikipedia Articles and its Information Sources in Different Languages

September 7, 2023
3:00pm to 4:15pm ET
JCC 270
Host: Liping Liu

Abstract

Currently, Wikipedia has over 60 million articles in over 300 languages. Anyone can create and edit articles in this encyclopedia. On average, more than 500,000 page edits are made every day by different users (including anonymous) from all over the world. At the same time, each language version of Wikipedia is edited independently, so information on the same topic may differ between languages. There's also the issue of people intentionally posting false information or damaging existing content.

Each language version of Wikipedia has its own quality standards, which might relate to timeliness, objectivity, completeness, style, readability, and other quality dimensions. However, these standards may also change over time in each language version. Therefore quality assessment of the all its information can be a challenging task. Moreover, “information quality” is a subjective concept and may depend on many factors, including the knowledge and experience of the evaluator.

The presence of information sources is one of the most important elements of Wikipedia articles and has a key impact on the quality of information. Wikipedia is based on the idea that any information added to an article should be backed up by reliable sources. Readers of this encyclopedia should be able to verify the information provided in the articles. However, the subjective nature of the concept of “reliability” and the dependence of the assessment on many factors (e.g. language version or topic) may create a problem for users editing Wikipedia in terms of selecting appropriate sources of information.

During the presentation, the possibilities of automatic quality assessment of Wikipedia articles and importance of information sources using artificial intelligence and big data sets will be discussed. In addition, tools for assessing the quality of Wikipedia articles and the importance of information sources in various languages and topics will be presented.

Speaker's Bio:

Dr. Włodzimierz Lewoniewski is an Assistant Professor at the Department of Information Systems at Poznań University of Economics and Business. His PhD thesis "The method of comparing and enriching information in multilingual wikis based on the analysis of their quality" was defended in 2019 and recognized as distinctive. His research areas include information quality in open knowledge bases (such as Wikipedia, DBpedia, Wikidata), fake news detection, natural language processing, artificial intelligence. He has authored over 40 publications in English, Polish and Russian. More details about his research can be found at https://kie.ue.poznan.pl/en/wlodzimierz- lewoniewski/.