MNA Science & Technology Desk: Microsoft has discreetly pulled a facial recognition database from its site that contained 10 million images of some 100,000 people.
The internet giant took down the database after a Financial Times investigation revealed that the database has been used by companies and military researchers to train facial recognition systems around the world.
The public dataset, called ‘MS Celeb,’ included images of ‘celebrities’ pulled from the internet, but also contained photos of ‘arguably private individuals,’ often without their knowledge or consent, the FT found.
Microsoft, which referred to MS Celeb as the largest publicly available facial recognition data set in the world, said the database was meant for use by academic researchers.
The images were harvested from the web under protection of the Creative Commons license, which allows for reuse of images for academic and educational purposes.
Microsoft didn’t announce publicly that the database had been taken down.
‘The site was intended for academic purposes,’ Microsoft told the FT.
‘It was run by an employee that is no longer with Microsoft and has since been removed.’
Following the FT report, databases run by Duke University and Stanford were also quietly taken offline.
The MS Celeb database, published in 2016, was first spotted by Berlin-based researcher Adam Harvey, who tracks the use of hundreds of face datasets.
Harvey found that Microsoft has used the MS Celeb dataset to train facial recognition systems, the FT reported.
The data has also been cited in AI research conducted by IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii.
Sensetime and Megvii supply equipment to officials in Xinjiang, a region in northwestern China, where ethnic minority groups are under surveillance and held in internment camps, according to the FT.
While Microsoft claims the dataset was populated with photos of celebrities, it also contained photos of Julie Brill, a former FTC commissioner, as well as several prominent security journalists.
‘Microsoft has exploited the term “celebrity” to include people who merely work online and have a digital identity,’ Harvey told the FT.
‘Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build.’
Some experts have since indicated that Microsoft pulled the database because the firm realized it could have violated GDPR laws.