US Patent:
20210081411, Mar 18, 2021
Inventors:
- Kirkland WA, US
Steven DeRose - Silver Spring MD, US
Taqi Jaffri - Kirkland WA, US
Luis Marti Orosa - Las Condes, CL
Michael Palmer - Edmonds WA, US
Jean Paoli - Kirkland WA, US
Christina Pavlopoulou - Emeryville CA, US
Elena Pricoiu - Issaquah WA, US
Swagatika Sarangi - Bellevue WA, US
Marcin Sawicki - Kirkland WA, US
Manar Shehadeh - Kirkland WA, US
Michael Taron - Seattle WA, US
Bhaven Toprani - Cupertino CA, US
Zubin Rustom Wadia - Chappaqua NY, US
David Watson - Seattle WA, US
Eric White - San Luis Obispo CA, US
Joshua Yongshin Fan - Bellevue WA, US
Kush Gupta - Seattle WA, US
Andrew Minh Hoang - Olympia WA, US
Zhanlin Liu - Seattle WA, US
Jerome George Paliakkara - Seattle WA, US
Zhaofeng Wu - Seattle WA, US
Yue Zhang - St Paul MN, US
Xiaoquan Zhou - Bellevue WA, US
International Classification:
G06F 16/2457
G06F 16/93
G06F 16/248
G06N 20/00
G06F 40/186
G06F 40/30
Abstract:
Machine learning, artificial intelligence, and other computer-implemented methods are used to identify various semantically important chunks in documents, automatically label them with appropriate datatypes and semantic roles, and use this enhanced information to assist authors and to support downstream processes. Chunk locations, datatypes, and semantic roles can often be automatically determined from what is here called “context”, to wit, the combination of their formatting, structure, and content; those of adjacent or nearby content; overall patterns of occurrence in a document, and similarities of all these things across documents (mainly but not exclusively among documents in the same document set). Similarity is not limited to exact or fuzzy string or property comparisons, but may include similarity of natural language grammatical structure, ML (machine learning) techniques such as measuring similarity of word, chunk, and other embeddings, and the datatypes and semantic roles of previously-identified chunks.