A Privacy-Preserving Method for Longitudinal Participant Linkage in Web Surveys

Authors

DOI:

https://doi.org/10.29015/cerem.1043

Keywords:

longitudinal survey methodology, anonymous respondent linkage, self-generated identification codes (SGIC), data privacy in empirical research

Abstract

Aim: To enable longitudinal linkage in online panel surveys without collecting direct identifiers and while aligning with modern data-protection requirements

Design / Research methods: The article proposes a client-side protocol where participants create a reproducible secret from a self-chosen pseudonym and an ordered image sequence. The browser normalizes and cryptographically hashes these inputs to derive a short alphanumeric core code, adds a modulus-97 checksum for strict local validation, and the backend stores only a salted hash scoped to a specific study (form-family) context.

Conclusions / findings: This paper introduces a client-side protocol for generating anonymous yet linkable participant identifiers in web-based surveys by deriving a reproducible code from a user-chosen pseudonym and image sequence entirely in the browser, and by storing only a form-family–salted hash on the server for longitudinal linkage within a study. The design incorporates a checksum for strict client-side validation and is intended to reduce spurious identifiers caused by typographical errors; empirical validation of matching performance, usability, and security properties is left for future work.

Originality / value of the article: The work refines SGIC-style respondent-generated linkage by combining graphical secrets with browser-based cryptographic processing, checksum-based client-side validation, and form-family salting-yielding a concrete, implementable algorithm that improves privacy-respecting longitudinal linkage.

 JEL: C81, C83.

References

Audette L.M., Hammond M.S., Rochester N.K. (2020), Methodological issues with coding participants in anonymous psychological longitudinal studies, “Educational and Psychological Measurement”, vol. 80 no. 1, pp. 163–185.

Brändle T., Pläschke A. (2024), Beyond matching rates: examining the accuracy of self-generated ID codes, https://osf.io/preprints/osf/k98j6_v1 [21.12.2025].

Calatrava M., de Irala J., Osorio A., Benítez E., Lopez-del Burgo C. (2022), Matched and fully private? A new self-generated identification code for school-based cohort studies to increase perceived anonymity, “Educational and Psychological Measurement”, vol. 82 no. 3, pp. 465–481.

DiIorio C., Soet J.E., Van Marter D., Woodring T.M., Dudley W.N. (2000), An evaluation of a self‐generated identification code, “Research in Nursing & Health”, vol. 23 no. 2, pp. 167–174.

Direnga J., Timmermann D., Lund J., Kautz C. (2016), Design and application of self-generated identification codes (SGICs) for matching longitudinal data, in: 44th SEFI Annual Conference, 12–15 September 2016, Tampere.

Grube J.W., Morgan M., Kearney K.A. (1989), Using self-generated identification codes to match questionnaires in panel studies of adolescent substance use, “Addictive Behaviors”, vol. 14 no. 2, pp. 159–171.

Kearney K.A., Hopkins R.H., Mauss A.L., Weisheit R.A. (1984), Self-generated identification codes for anonymous collection of longitudinal questionnaire data, “Public Opinion Quarterly”, vol. 48 no. 1B, pp. 370–378.

Krótkiewicz M., Wojtkiewicz K., Martins D. (2018), Influence power factor for user interface recommendation system, in: International conference on Computational Collective Intelligence, Springer International Publishing, Cham, pp. 228–237.

Kuzu M., Kantarcioglu M., Inan A., Bertino E., Durham E., Malin B. (2013), Efficient privacy-aware record integration, in: Proceedings of the 16th International Conference on Extending Database Technology, pp. 167–178, https://doi.org/10.1145/2452376.2452398.

Platje J., Palak R., Wojtkiewicz K. (2025), Sokrates forms. A research instrument for creating social impact of science on the example of system risk management, “The Central European Review of Economics and Management (CEREM)”, vol. 9 no. 1, pp. 47–62.

Sandnes F.E. (2021a), CANDIDATE. A tool for generating anonymous participant-linking IDs in multi-session studies, “PloS One”, vol. 16 no. 12, e0260569.

Sandnes F.E. (2021b), HIDE: short IDs for robust and anonymous linking of users across multiple sessions in small HCI experiments, in: Extended abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Kitamura Y. (ed.), Association for Computing Machinery, Washington, pp. 1–6.

Vacek J., Gabrhelík R. (2024), Feasibility study of linking anonymous data of children in longitudinal school-based prevention research, “Addictology/Adiktologie”, vol. 24 no. 2, pp. 99–108.

Wiedenbeck S., Waters J., Birget J.C., Brodskiy A., Memon N. (2005), Authentication using graphical passwords. Effects of tolerance and image choice, in: Proceedings of the 2005 symposium on usable privacy and security, Association for Computing Machinery, Washington, pp. 1–12.

Wojtkiewicz K., Palak R., Telec Z., Litwinienko F. (2024), Modelling cognitive load of computer game users-case study, in: European conference on artificial intelligence, Springer Nature Switzerland, Cham, pp. 52–63.

Yurek L.A., Vasey J., Sullivan Havens D. (2008), The use of self-generated identification codes in longitudinal research, “Evaluation Review”, vol. 32 no. 5, pp. 435–452.

Downloads

Published

2025-12-30