This is a sample of duplication in LI URLs across profiles. URL, occurrences [(' linkedin.com/in/bengolub ', 16), (' linkedin.com/in/geoffreysanders ', 13), (' linkedin.com/in/dcancel ', 13), (' linkedin.com/in/evanreiser ', 12), (' linkedin.com/in/stephen-miller-010167b ', 10), (' linkedin.com/in/matt-petersen-3a50001 ', 9), (' linkedin.com/in/guyhalfteck ', 6), (' linkedin.com/in/jimfishersf ', 6), (' linkedin.com/in/philpoje ', 5), (' linkedin.com/in/excelandaccess ', 5), (' linkedin.com/in/billhu ', 5), (' linkedin.com/in/deborahaingram ', 5), (' linkedin.com/in/brianshillair ', 5), (' linkedin.com/in/ghonim ', 5), (' linkedin.com/in/wmsloane ', 5), (' linkedin.com/in/craigzampa ', 5), (' linkedin.com/in/kiaracancer ', 5), (' linkedin.com/in/roshanloungani ', 4), (' linkedin.com/in/dennisju ', 4), (' linkedin.com/in/ryanbonnici ', 4)] All the examples I've looked at have different data (for the URL). It looks like it's the same person, just a difference in the profile. For example, in the most repeated URL example of linkedin.com/in/bengolub , in 8 of the records he has a job_title field of executive chairman and chief executive officer and in the other 8 has a job_title field of executive chairman and interim chief executive officer. I then looked at the set of 8 profiles with the title executive chairman and chief executive officer . Those are very similar, but have different job_last_updated timestamps and the location differed between the two (one had ['san francisco, california, united states'] and the other had ['san francisco, california, united states', 'los altos, california, united states']. There are 89,731 URLs which have duplicate profiles and a total of 180,796 profiles; 5.5mm profiles total in pull. Customer researched 50 profiles and dup LI URLs were the same person.