Nov 27 2018
At the RIKEN Center for Advanced Intelligence Project (AIP), researchers have come up with a novel machine learning technique that enables an AI to make classifications without the so-called “negative data.” This latest discovery could result in broader application to many different classification tasks.
In the day-to-day life, classification of things is very important. For instance, fake political news, spam mail, and also more mundane stuff like faces or objects have to be detected. When AI is used, such kinds of tasks are based on “classification technology” in machine learning—that is, making the computer learn through the boundary that separates negative and positive data. For instance, “positive” and “negative” data would be photos that include a happy face and a sad face, respectively. After learning a classification boundary, the computer will be able to establish whether a specific data is negative or positive. One problem with this technology is that it requires positive as well as negative data for the learning process, but in the majority of cases, negative data are not available (for example, it is not easy to locate photos with the label, “this photo includes a sad face,” because most people smile before a camera).
With regards to real-life programs, if a retailer is attempting to predict who will actually make a purchase, it can effortlessly locate data on customers who bought from them, that is, positive data; however, it would not be possible to get data on customers who did not purchase from them, that is, negative data, because they obviously will not have access to their competitors’ data.
Another case in point is a typical task for app developers: they have to foretell which users will stop using the app (negative) or continue using it (positive). Conversely, when a user unsubscribes, the data of that user will be lost by the developers because they have to fully erase the data concerning that specific user according to the privacy policy to safeguard personal information.
Previous classification methods could not cope with the situation where negative data were not available, but we have made it possible for computers to learn with only positive data, as long as we have a confidence score for our positive data, constructed from information such as buying intention or the active rate of app users. Using our new method, we can let computers learn a classifier only from positive data equipped with confidence.
Takashi Ishida, Study Lead Author, RIKEN Center for Advanced Intelligence Project.
Along with team leader Masashi Sugiyama and researcher Gang Niu from his group, Ishida recommended that adding the confidence score will allow computers to learn well which mathematically relates to the possibility whether the information belongs to a positive class or not. Subsequently, the team was able to develop a technique that can permit computers to learn a classification boundary only from information and positive data on its confidence score, that is, positive reliability against classification issues of machine learning that split data in a positive and negative way.
In order to check how well the system functioned, the researchers applied it to a series of photos that includes numerous labels of fashion items. For instance, they selected “T-shirt” as the positive class and another item, for example, “sandal,” as the negative class, and subsequently attached a confidence score to the “T-shirt” photos. The researchers observed that without accessing the negative data (for example, “sandal” photos), in certain cases, their technique was equally good as the technique that involves the use of negative and positive data.
This discovery could expand the range of applications where classification technology can be used. Even in fields where machine learning has been actively used, our classification technology could be used in new situations where only positive data can be gathered due to data regulation or business constraints. In the near future, we hope to put our technology to use in various research fields, such as natural language processing, computer vision, robotics, and bioinformatics.
Takashi Ishida, Study Lead Author, RIKEN Center for Advanced Intelligence Project.