Skip to content
This repository has been archived by the owner on Feb 7, 2019. It is now read-only.

Missing delimiters and NULL placeholders #3

Open
wwentland opened this issue Aug 28, 2009 · 2 comments
Open

Missing delimiters and NULL placeholders #3

wwentland opened this issue Aug 28, 2009 · 2 comments

Comments

@wwentland
Copy link

xml2sql output is faulty for missing columns

If I run xml2sql i get the following output for revision:

$ head -n 6 revision.sql
-- xml2sql - MediaWiki XML to SQL converter
-- Table revision for PostgreSQL

COPY "revision" FROM STDIN;
287059  2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  2009-08-19T03:20:58Z    1   0
32885   10  32885   +jamii  160 Baba Tabita 2007-02-02T10:14:01Z    1   0

If i try to import this i get the following error:

psql -h localhost -d test2 -U babilen -W < revision.sql
Password for user babilen: 
ERROR:  missing data for column "rev_len"
CONTEXT:  COPY revision, line 1: "287059    2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  20090819032058  1   0"

The issue can be fixed if the following awk oneliner is applied to the txt files (or lines within revision.sql that hold values)

awk -v e=10 'BEGIN { FS = OFS = "\t"; e++ } NF > e { exit(1) } { t = ""; for (a = 0; a < e - NF; a++) t = t "\t\\N"; printf "%s%s\n", $0, t }'

The difference in output is:
287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0
287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0 \N \N

The \N are placeholders for NULL values which is OK since the respective columns (rev_len and rev_parent_id) are defined as nullable:

CREATE TABLE revision (
....
   rev_len         INTEGER          NULL,
   rev_parent_id   INTEGER          NULL
);

thanks

babilen

@antot
Copy link

antot commented Feb 23, 2010

another possibility to solve this is to integrate the modification into xml2sql instead of postprocessing its output. The patch can be found here:
http://meta.wikimedia.org/wiki/Xml2sql#patch_for_recent_versions_of_mw_.28.3E.3D1.10.29

@wwentland
Copy link
Author

Correct. I have forked xml2sql and fixed the issues I reported here. You might want to take a look at:

http://github.com/babilen/mediawiki-xml2sql

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants