Skip to content
forked from ZZR8066/SEM

The code of paper: "Split, Embed and Merge: An Accurate Table Structure Recognizer", Zhenrong Zhang, Jianshu Zhang, Jun Du, Fengren Wang. Pattern Recognition, 2022.

Notifications You must be signed in to change notification settings

SPRATeam-USTC/SEM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Split, Embed and Merge: An accurate table structure recognizer

This repository contains the source code of our Pattern Recognition 2022 paper: Split, Embed and Merge: An accurate table structure recognizer.

Introduction

pipeline

Split, Embed and Merge (SEM) is a new framework for parsing the tabular data into the structured format, which is mainly composed of three parts, splitter, embedder and merger. We won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing.

Dataset

We provide scripts for processing the SciTSR dataset, which contains 15,000 tables in PDF format as well as their corresponding structure labels.

It’s worth noting that we need to align the text information with the table cells in order to generate labels of splitter.

Requirements

  • torch==1.7.1

Training and Testing

python runner/train.py --cfg default

Citation

If you find SEM useful in your research, please consider citing:

@article{zhang2022split,
  title={Split, embed and merge: An accurate table structure recognizer},
  author={Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wang, Fengren},
  journal={Pattern Recognition},
  volume={126},
  pages={108565},
  year={2022},
  publisher={Elsevier}
}

About

The code of paper: "Split, Embed and Merge: An Accurate Table Structure Recognizer", Zhenrong Zhang, Jianshu Zhang, Jun Du, Fengren Wang. Pattern Recognition, 2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%