PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Introduction

Motivation

대부분의 OCR 기반의 Key Information Extraction(KIE) 방법론들은 textual features와 position features만 사용하고 있습니다. 하지만 풍부한 semantic representation을 얻기 위해서는 visual feature와 global layout까지 사용하는 것이 좋을 수 있습니다.

KIE Approaches

기존 KIE 방법론과 이 논문에서 제안하는 방법론은 다음의 그림 한장으로 설명할 수 있습니다.

Untitled

Figure 2-(a) : 전통적인 접근방법으로 hand-craft features(e.g., regex and template matching)을 사용하는 방법 ⇒ task specific knowledge와 human-designed rules에 의존하기 때문에 다른 도메인으로 확장하기가 어려움.
Figure 2-(b) : 최근의 접근 방법들로 자연어처리 기반의 모델들에 Named Entity Recognition이나 Sequence Labeling을 적용하는 방법 ⇒ Sequence Labeling 방식은 문서의 global layout을 충분히 반영하지 못하고 있으며 visual feature를 잘 활용하지 못함. 최근 visual feature를 활용하는 LayoutLM이 제안되었지만 이 모델은 두 개의 text segments간의 latent relationship을 고려하지 못한다는 문제가 있음.
Figure 2-(c) : Graph Convolution 연산을 통해 textual and visual information을 결합하는 그래프를 사전에 정의하는 방법 ⇒ 사전에 task-specific edge type, adjacent matrix를 정의해야 하므로 상당한 도메인 지식과 문서의 구조가 복잡할 때 적용하기가 어렵다는 문제가 있음.
Figure 2-(d) : 이 논문의 방법론으로 visual feature, textual feature, 관계 추출에 강력한 graph learning module을 통한 global layout feature를 얻어 KIE task에 사용하는 방법

Contributions

복잡한 layout을 가지는 문서들에서 Key Information Extraction task를 수행하기 위한 새로운 방법론 제안.
사전에 그래프 구조를 정의하는 것 없이 graph learning module을 모델에 도입함으로써 복잡한 documents에 대한 graph structure(=graph representation)를 정제할 수 있는 방법론 제안.

Proposed Method

논문에서 제안하는 방법론인 PICK(Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks)은 아래의 그림과 같이 Encoder, Graph Module, Decoder의 3가지 module로 나눌 수 있습니다.

Untitled