Finding tribes: Detecting social ties from affiliation data
Affiliation networks are bipartite graphs describing the connections between two classes of entities, A and B. For example, the entities might be people (A) and the events they attend (B) or movie viewers (A) and the titles they rent (B). Often in such networks, it is reasonable to model a dependency structure among the B objects but to treat the A objects as conditionally independent given that structure. For instance, Tomb Raider and Indiana Jones are associated in that they often co-occur on people's rental lists, but people create their Netflix queues independently of each other. I'll present a case study in which, in contrast, we also looked for associations among the people: cases where people's choices of B objects were too similar to have arisen independently. This process revealed glimpses of their underlying social network within a data set collected for another purpose.
In the securities industry, fraud can be perpetuated by "tribes" of employees who collude at multiple jobs. Employees of the industry register their job histories and other information with the industry's regulatory body, FINRA. In collaboration with FINRA, we developed a family of algorithms to detect such tribes: small groups of individuals sharing unusual sequences of affiliations. We validated that these inferred social ties connected people who did appear to coordinate their job moves, and that they linked individuals with similar, and elevated, risk profiles for fraud.
bio: Lisa Friedland is a Ph.D. candidate at the University of Massachusetts Amherst, studying relational knowledge discovery with advisor David Jensen. She earned her B.A. and M.S. in Computer Science from Harvard (1998) and UMass (2006), respectively. In addition to fraud detection, she has worked on relational entity resolution for physics citation data and for Hollywood movies, on predicting protein-protein interactions in yeast, and most recently, on building a search engine for jokes.