Google’s Smart Cleanup taps AI to streamline data entry

In June, Google unveiled Smart Cleanup, a Google Sheets feature that taps AI to learn patterns and autocomplete data while surfacing formatting suggestions. Now, following a months-long beta, Smart Cleanup is today launching into general availability for all G Suite users.

Smart Cleanup comes as Google looks to inject G Suite with more AI-powered functionality. Recently, the company added a feature that lets users ask natural language questions about data in spreadsheets, like “Which person has the top score?” and “What’s the sum of price by salesperson?” Google Meet earlier this year gained adaptive noise cancellation. And two years ago, Google rolled out Quick Access, a machine learning-powered tool that suggests files relevant to documents users are editing, to Sheets, Docs, and Slides.

As G Suite project manager Ryan Weber explained in an interview with VentureBeat, Smart Cleanup was created in an attempt to unify and improve the discoverability of Sheets’ existing AI-powered auto-formatting features. “What we find is that just because the functionality is there doesn’t always mean that users know it and know how to use it,” he said. Weber gave the example of white-space-trimming and data-deduplication tools that launched over a year ago. “The problem is that no one knows these features exist — they don’t know what to look for in the menus.”

Smart Cleanup is proactive in the sense that it surfaces suggestions in Sheets’ side panel. It helps identify and fix duplicate rows and number-formatting issues, showing column stats that provide a snapshot of data, including the distribution of values and the most frequent value in a column. At the same time, Smart Cleanup evaluates whether common cleanup actions like removing duplicates are relevant for a given sheet and spotlights the most appropriate suggestions to aid users in streamlining data prior to analysis.

“Let’s say you’re ready to import some data. You want to upload a .txt file or paste in a big table of data. Once you do that, Smart Cleanup will use AI to detect this and do things like trim whitespace and apply number, currency, and date formatting,” Weber said.

One of Smart Cleanup’s more powerful features is semantic duplicate detection. If there’s a column in a document labeled “Country” and within that column entities like “USA” and “United States of America,” Smart Cleanup will recognize that those entities refer to the same thing: United States. Reflecting this, it will suggest replacing differently named entities with a standard nomenclature (say, “United States”) to eliminate duplicates.

G Suite Smart Cleanup

Weber says that the AI models underpinning Smart Cleanup were trained on large data sets from Sheets containing anonymized and aggregated information, and that they continue to improve over time as people interact with Smart Cleanup and either accept or reject changes. These models, which were developed using Google’s TensorFlow machine learning framework and trained on in-house tensor processing units (TPUs), only trigger suggestions when they reach a certain confidence threshold. That’s to prevent unwelcome or erroneous recommendations from popping up in users’ feeds.

“We try to err on the side of accuracy,” Weber said. “We look at things like the rate of acceptance to make sure that the acceptance rate of these features is high. If that drops below a baseline value, that means people aren’t finding value — that these things aren’t correct. And so we try to make sure that we’re giving high-quality suggestions … Much of our time spent is optimizing for when to show things and, just as importantly, when not to show things because we don’t want to slow users down more to make them frustrated.”

Smart Cleanup’s models also draw on the Google Knowledge Graph, the knowledge base Google uses to enhance its services with information gathered from a range of web sources. Its data is retrieved from the CIA World Factbook, Wikidata, and Wikipedia, among other sources, and it spans over 500 billion facts on more than 5 billion entities.

Another key source of context for the models is what Weber calls the “enterprise knowledge graph.” It contains organization-level information like contacts from a company’s G Suite people directory, enabling Smart Cleanup to recognize things like emails, names, addresses, and more.

“Smart Cleanup uses the Knowledge Graph and enterprise knowledge graph for semantic duplicates so it can figure out when people are typing, for example, different abbreviations for a state, country, or company. The data sets allow it to figure out that these are often the same thing and suggest replacing them with a consistent piece of text,” Weber said.

Weber was coy when asked what the future might hold for Smart Cleanup and Google Sheets broadly, but he asserted that spreadsheets are becoming more capable than they used to be thanks in part to AI. “Today, many people use spreadsheets, but they only use a very small percentage of the true power behind the spreadsheets … So I think there’s a huge opportunity for us to think about how we expose that power to beginner users and how we democratize data analysis so we don’t have users feeling like they have to read a book on how to become a spreadsheet expert … There’s a whole host of things we’re thinking about investing in to make sure that anyone regardless of skill set can get a ton of value out of sheets,” Weber said.

Leave a Reply

Your email address will not be published. Required fields are marked *