llm-table-survey

LLM-Table-Survey

📄 Paper List

Large Language Model

GPT-3, Language Models are Few-Shot Learners. NeurIPS 20. [Paper]
T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. [Paper]
FLAN, Finetuned Language Models Are Zero-Shot Learners. ICLR 22. [Paper] [Code]
DPO, Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 23. [Paper]
PEFT, The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP 21. [Paper]
LoRA, LoRA: Low-rank Adaptation of Large Language Models. ICLR 22. [Paper]
Chain-of-thought Prompting, Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 22. [Paper]
Least-to-most Prompting, Least-to-most prompting enables complex reasoning in large language models. ICLR 23. [Paper]
Self-consistency Prompting, Self-consistency improves chain of thought reasoning in language models. ICLR 23. [Paper]
ReAct, ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 23. [Paper] [Code]

Pre-LLM Era Table Training

TaBERT, TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 20 Main. [Paper] [Code]
TaPEx, TAPEX: Table Pre-training via Learning a Neural SQL Executor. ICLR 22. [Paper] [Code] [Models]
TABBIE, TABBIE: Pretrained Representations of Tabular Data. NAACL 21 Main. [Paper] [Code]
TURL, TURL: Table Understanding through Representation Learning. VLDB 21. [Paper] [Code]
RESDSQL, RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL. AAAI 23. [Paper] [Code]
UnifiedSKG, UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. EMNLP 22 Main. [Paper ] [Code]
SpreadsheetCoder, SpreadsheetCoder: Formula Prediction from Semi-structured Context. ICML 21. [Paper] [Code]

Table Instruction-Tuning

Table-GPT, Table-GPT: Table-tuned GPT for Diverse Table Tasks. arXiv 2023. [Paper]
TableLlama, TableLlama: Towards Open Large Generalist Models for Tables. NAACL 24. [Paper] [Code] [Model: TableLlama 7B] [Dataset: TableInstruct]

Code LLM

Codex, Evaluating Large Language Models Trained on Code. arXiv 21. [Paper]
StarCoder, StarCoder: may the source be with you!. TMLR 23. [Paper] [Code] [Models]
Code Llama, Code Llama: Open Foundation Models for Code. arXiv 23. [Paper] [Code]
WizardLM, WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions. ICLR 24. [Paper] [Model: WizardLM 13B] [Model: WizardLM 70B]
WizardCoder, WizardCoder: Empowering Code Large Language Models with Evol-Instruct. ICLR 24. [Paper] [Code] [Models: WizardCoder 15B]
Magicoder, Magicoder: Source Code Is All You Need. ICML 24. [Paper] [Code] [Models 6.7B/7B]
Lemur, Lemur: Harmonizing Natural Language and Code for Language Agents. ICLR 24. [Paper] [Code] [Model: Lemur 70B] [Model: Lemur 70B Chat]
InfiAgent-DABench, InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks. ICML 24. [Paper] [Code]

Hybrid of Table & Code

TableLLM, TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios. [Paper] [Model TableLLM 7B] [Model TableLLM 13B]
StructLM, StructLM: Towards Building Generalist Models for Structured Knowledge Grounding. arXiv 24. [Paper] [Model: StructLM 7B] [Model: StructLM 13B] [Model: StructLM 34B] [Dataset: SKGInstruct]

Parameter-Efficient Fine-Tuning

FinSQL, FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. SIGMOD Companion 24. [[Paper](https://arxiv.org/pdf/2401.10506)]

Direct Preference Optimization

SENSE, Synthesizing Text-to-SQL Data from Weak and Strong LLMs. ACL 24. [Paper]

Small Language Model + Large Language Model

ZeroNL2SQL, Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL. VLDB 24. [Paper]

Multimodal Table Understanding & Extraction

LayoutLM, LayoutLM: Pre-training of Text and Layout for Document Image Understanding. KDD 20. [Paper]
PubTabNet, Image-Based Table Recognition: Data, Model, and Evaluation. ECCV 20. [Paper] [Code & Data]
Table-LLaVA, Multimodal Table Understanding. ACL 24. [Paper] [Code] [Model]
TableLVM, TableVLM: Multi-modal Pre-training for Table Structure Recognition. ACL 23. [Paper]
PixT3, PixT3: Pixel-based Table-To-Text Generation. ACL 24. [Paper]

Representation

Tabular representation, noisy operators, and impacts on table structure understanding tasks in LLMs. NeurIPS 2023 second table representation learning workshop. [Paper]
SpreadsheetLLM, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models. arXiv 24. [Paper]
Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies. EMNLP 23. [Paper] [Code]
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs. arXiv 24. [Paper]

Prompting

NL2SQL

The Dawn of Natural Language to SQL: Are We Fully Ready? VLDB 24. [Paper] [Code]
MCS-SQL, MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation. [Paper]
DIN-SQL, DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction Prompting, Decompose. NeurIPS 23. [Paper] [Code]
DAIL-SQL, Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. VLDB 24. [Paper] [Code]
C3, C3: Zero-shot Text-to-SQL with ChatGPT. arXiv 24. [Paper] [Code]

Table QA

Dater, Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning. SIGIR 23. [Paper] [Code]
Binder, Binding language models in symbolic languages. ICLR 23. [Paper] [Code]
ReAcTable, ReAcTable: Enhancing ReAct for Table Question Answering. VLDB 24. [Paper] [Code]
E5, E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate. NAACL 24. [Paper] [Code]
Chain-of-Table, Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. ICLR 24. [Paper]
ITR, An Inner Table Retriever for Robust Table Question Answering. ACL 23. [Paper]
LI-RAGE, LI-RAGE: Late Interaction Retrieval Augmented Generation with Explicit Signals for Open-Domain Table Question Answering. ACL 23. [Paper]

Spreadsheet

SheetCopilot, SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models Agent. NeurIPS 23. [Paper] [Code]
SheetAgent, SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models. arXiv 24. [Paper]
Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities. arXiv 24. [Paper]

Multi-task Framework

StructGPT, StructGPT: A General Framework for Large Language Model to Reason over Structured Data. EMNLP 23 Main. [Paper] [Code]
TAP4LLM, TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning. arXiv 23. [Paper]
UniDM, UniDM: A Unified Framework for Data Manipulation with Large Language Models. MLSys 24. [Paper]
Data-Copilot, Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. arXiv 23. [Paper] [Code]

Tools

LlamaIndex
PandasAI
Vanna
DB-GPT. DB-GPT: Empowering Database Interactions with Private Large Language Models. [Paper] [Code]
RetClean. RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes. [Paper] [Code]

Survey

A Survey of Large Language Models. [Paper]
A Survey on Large Language Model Based Autonomous Agents. [Paper]
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks. [Paper]
Transformers for tabular data representation: A survey of models and applications. [Paper]
A Survey of Table Reasoning with
Large Language Models. [Paper]
A survey on table question answering: Recent advances. [Paper]
Large Language Models(LLMs) on Tabular Data – A Survey. [Paper]
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. [Paper]

📊 Datasets & Benchmarks

Benchmarks

Name	Keywords	Artifact	Paper
MBPP	Code	link	arXiv 21
HumanEval	Code	link	arXiv 21
Dr.Spider	NL2SQL, Robustness	link	ICLR 23
WiKiTableQuestions	Table QA	link	ACL 15
WiKiSQL	Table QA,NL2SQL	link	arXiv 17
TabFact	Table Fact Verification	link	ICLR 20
HyBirdQA	Table QA	link	EMNLP 20
FetaQA	Table Fact Verification	link	TACL 22
RobuT	Table QA	link	ACL 23
AnaMeta	Table Metadata	link	ACL 23
GPT4Table	Table QA, Table-to-text	link	WSDM 24
ToTTo	Table-to-text	link	EMNLP 20
SpreadsheetBench	Spreadsheet Manipulation	link	NeurIPS 24
BIRD	NL2SQL	link	NeurIPS 23
Spider	NL2SQL	link	EMNLP 18
Dr.Spider	NL2SQL	link	ICLR 23
ScienceBenchmark	NL2SQL	link	VLDB 24
DS-1000	Data Analysis	link	ICML 23
InfiAgent-DABench	Data Analysis	link	ICML 24
TableBank	Table Detection	link	LERC 20
PubTabNet	Table Extraction	link	ECCV 20
ComTQA	Visual Table QA, Table Detection, Table Extraction	link	arXiv 24

Datasets

Name	Keywords	Artifact	Paper
TableInstruct	Table Instruction Tuning	link	arXiv 23
WDC	Web Table	link	WWW 16
GitTables	GitHub CSVs	link	SIGMOD 23
DART	Table-to-text	link	NAACL 21
MMTab	Multimodal Table Understanding	link	ACL 24
SchemaPile	Database Schemas	link	SIGMOD 24

llm-table-survey

LLM-Table-Survey

Table of Contents

📄 Paper List

Large Language Model

Pre-LLM Era Table Training

Table Instruction-Tuning

Code LLM

Hybrid of Table & Code

Parameter-Efficient Fine-Tuning

Direct Preference Optimization

Small Language Model + Large Language Model

Multimodal Table Understanding & Extraction

Representation

Prompting

NL2SQL

Table QA

Spreadsheet

Multi-task Framework

Tools

Survey

📊 Datasets & Benchmarks

Benchmarks

Datasets

Comments

Leave a Reply Cancel reply

More posts

latex-protocol

apparatus_romanus

Absolute-Enable-Copy

WebEngine