User Identification in the Process of Web Usage Data Preprocessing

Jozef Kapusta; Michal Munk; Dominik Halvoník; Martin Drlík

doi:10.3991/ijet.v14i09.9854

User Identification in the Process of Web Usage Data Preprocessing

Authors

Jozef Kapusta Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74, Nitra
Michal Munk Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74, Nitra
Dominik Halvoník Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74, Nitra
Martin Drlík Constantine the Philosopher University in Nitra, Tr. A. Hlinku 1, 949 74, Nitra

DOI:

https://doi.org/10.3991/ijet.v14i09.9854

Keywords:

web usage mining, cookies, session time thresholds, sequence rules

Abstract

If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.

Downloads

Published

2019-05-14

How to Cite

Kapusta, J., Munk, M., Halvoník, D., & Drlík, M. (2019). User Identification in the Process of Web Usage Data Preprocessing. International Journal of Emerging Technologies in Learning (iJET), 14(09), pp. 21–33. https://doi.org/10.3991/ijet.v14i09.9854

Download Citation

Issue

Vol. 14 No. 09 (2019)

Section

Papers

License

The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.
Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY What does this mean?). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.
By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.

User Identification in the Process of Web Usage Data Preprocessing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Information

Rankings

Other journals