Speaker linking in large data sets