Work

The Impact of Entity Resolution on Observed Social Network Structure

Public

Downloadable Content

Download PDF

Deduplication, also referred to as "entity resolution", is a common and crucial pre-processing step in the construction of social networks. Traditional deduplication methods compare the attributes (such as name and age) of potential matching pairs to estimate a match probability for a pair. Recently research has used clustering techniques for entity resolution, where each cluster represents a unique underlying entity. In social network datasets, we can also use relational information (e.g., a person’s network ties) in deduplication. Entity resolution is inherently an imperfect process and is an outcome of existing measurement error, particularly when there is a lack of a manually-reviewed, "ground-truth" dataset to rely on for parameter tuning. My work is focused on methods for evaluating entity resolution in a network setting with and without "ground truth", measuring the sensitivity of entity resolution results to choices in tuning parameters and transitive closure, and the downstream impacts these parameter choices can have on both local and global network metrics. I apply the evaluation methods to two real-world ego-centric network studies, (i) CARE2HOPE, a respondent-driven sample of rural people who use drugs (PWUD) in Appalachian Kentucky, and (ii) RADAR, a longitudinal network study of young men in Chicago who have sex with men. I formulate a revised entity resolution process that takes into account downstream network impacts.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items