As machine learning has grown, one of the major bottlenecks remains labeling things so the machine learning application understands the data it’s working with. Datasaur, a member of the Y Combinator Winter 2020 batch, announced a $3.9 million investment today to help solve that problem with a platform designed for machine learning labeling teams.
The funding announcement, which includes a pre-seed amount of $1.1 million from last year and $2.8 million seed right after it graduated from Y Combinator in March, included investments from Initialized Capital, Y Combinator and OpenAI CTO Greg Brockman.
Company founder Ivan Lee says that he has been working in various capacities involving AI for seven years. First when his mobile gaming startup, Loki Studios was acquired by Yahoo! in 2013, and Lee was eventually moved to the AI team, and most recently at Apple. Regardless of the company, he consistently saw a problem around organizing machine learning labeling teams, one that he felt he was uniquely situated to solve because of his experience.
“I have spent millions of dollars [in budget over the years] and spent countless hours gathering labeled data for my engineers. I came to recognize that this was something that was a problem across all the companies that I’ve been at. And they were just consistently reinventing the wheel and the process. So instead of reinventing that for the third time at Apple, my most recent company, I decided to solve it once and for all for the industry. And that’s why we started Datasaur last year,” Lee told TechCrunch.
He built a platform to speed up human data labeling with a dose of AI, while keeping humans involved. The platform consists of three parts: a labeling interface, the intelligence component, which can recognize basic things, so the labeler isn’t identifying the same thing over and over, and finally a team organizing component.
He says the area is hot, but to this point has mostly involved labeling consulting solutions, which farm out labeling to contractors. He points to the sale of Figure Eight in March 2019 and to Scale, which snagged $100 million last year as examples of other startups trying to solve this problem in this way, but he believes his company is doing something different by building a fully software-based solution
The company currently offers a cloud and on-prem solution, depending on the customer’s requirements. It has 10 employees with plans to hire in the next year, although he didn’t share an exact number. As he does that, he says he has been working with a partner at investor Initialized on creating a positive and inclusive culture inside the organization, and that includes conversations about hiring a diverse workforce as he builds the company.
“I feel like this is just standard CEO speak but that is something that we absolutely value in our top of funnel for the hiring process,” he said.
As Lee builds out his platform, he has also worried about built-in bias in AI systems and the detrimental impact that could have on society. He says that he has spoken to clients about the role of labeling in bias and ways of combatting that.
“When I speak with our clients, I talk to them about the potential for bias from their labelers and built into our product itself is the ability to assign multiple people to the same project. And I explain to my clients that this can be more costly, but from personal experience I know that it can improve results dramatically to get multiple perspectives on the exact same data,” he said.
Lee believes humans will continue to be involved in the labeling process in some way, even as parts of the process become more automated. “The very nature of our existence [as a company] will always require humans in the loop, […] and moving forward I do think it’s really important that as we get into more and more of the long tail use cases of AI, we will need humans to continue to educate and inform AI, and that’s going to be a critical part of how this technology develops.”