Skip to content
/ DQuaG Public

Automated Data Quality Validation Using Graph Representation Learning

Notifications You must be signed in to change notification settings

SiSijie/DQuaG

Repository files navigation

DQuaG

Automated Data Quality Validation Using Graph Representation Learning

Overview

This repository contains the implementation of DQuaG (Data Quality Graph), a novel approach for automated data quality validation using graph representation learning. Our method leverages Graph Neural Network (GNN) to detect and infer underlying data quality issues in datasets. This approach is designed to overcome limitations of traditional data quality validation methods which often fail to capture complex interdependencies within the data.

DQuaG Framework Figure 2: Framework of DQuaG approach

Installation

To run the code, please follow these steps to set up your environment:

  1. Clone the repository
  2. Install the required packages: pip install -r requirements.txt

Experimental Setup

Our experiments were conducted using the following environment:

  • Python 3.11
  • PyTorch 1.12.1

Datasets

We evaluate our approach using datasets with varied error types and data structures:

Code Structure

This repository contains the following Jupyter notebooks at the root directory, each serving a specific purpose within the DQuaG framework:

  • requirements.txt: All necessary Python dependencies.

These notebooks are designed to be run sequentially to understand the implementation and to replicate our experimental results.

About

Automated Data Quality Validation Using Graph Representation Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published