Duplication in LinkedIn URLs across profiles
being researched
B
Ben Yarbro
This is a sample of duplication in LI URLs across profiles.
URL, occurrences
[('linkedin.com/in/bengolub', 16),
('linkedin.com/in/geoffreysanders', 13),
('linkedin.com/in/dcancel', 13),
('linkedin.com/in/evanreiser', 12),
('linkedin.com/in/guyhalfteck', 6),
('linkedin.com/in/jimfishersf', 6),
('linkedin.com/in/philpoje', 5),
('linkedin.com/in/excelandaccess', 5),
('linkedin.com/in/billhu', 5),
('linkedin.com/in/deborahaingram', 5),
('linkedin.com/in/brianshillair', 5),
('linkedin.com/in/ghonim', 5),
('linkedin.com/in/wmsloane', 5),
('linkedin.com/in/craigzampa', 5),
('linkedin.com/in/kiaracancer', 5),
('linkedin.com/in/roshanloungani', 4),
('linkedin.com/in/dennisju', 4),
('linkedin.com/in/ryanbonnici', 4)]
All the examples I've looked at have different data (for the URL). It looks like it's the same person, just a difference in the profile. For example, in the most repeated URL example of linkedin.com/in/bengolub, in 8 of the records he has a job_title field of executive chairman and chief executive officer and in the other 8 has a job_title field of executive chairman and interim chief executive officer. I then looked at the set of 8 profiles with the title executive chairman and chief executive officer . Those are very similar, but have different job_last_updated timestamps and the location differed between the two (one had ['san francisco, california, united states'] and the other had ['san francisco, california, united states', 'los altos, california, united states'].
There are 89,731 URLs which have duplicate profiles and a total of 180,796 profiles;
5.5mm profiles total in pull.
Customer researched 50 profiles and dup LI URLs were the same person.
C
Christine Biddlecombe
being researched