Skip to content

Commit

Permalink
Allow - from the command line to process from standard input.
Browse files Browse the repository at this point in the history
Also augment the documentation with examples of bare stdin reading and
of advantages of the unix pipes to stream even remove archived content
down to PostgreSQL.
  • Loading branch information
dimitri committed Dec 27, 2014
1 parent f2bf5c4 commit 6d76bc5
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 9 deletions.
58 changes: 58 additions & 0 deletions pgloader.1
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,38 @@ For documentation about the available syntaxes for the \fB\-\-field\fR and \fB\-
.P
Note also that the PostgreSQL URI includes the target \fItablename\fR\.
.
.SS "Reading from STDIN"
File based pgloader sources can be loaded from the standard input, as in the following example:
.
.IP "" 4
.
.nf

pgloader \-\-type csv \e
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \e
\-\-with "skip header = 1" \e
\-\-with "fields terminated by \'\et\'" \e
\- \e
postgresql:///pgloader?districts_longlat \e
< test/data/2013_Gaz_113CDs_national\.txt
.
.fi
.
.IP "" 0
.
.P
The dash (\fB\-\fR) character as a source is used to mean \fIstandard input\fR, as usual in Unix command lines\. It\'s possible to stream compressed content to pgloader with this technique, using the Unix pipe:
.
.IP "" 4
.
.nf

gunzip \-c source\.gz | pgloader \-\-type csv \.\.\. \- pgsql:///target?foo
.
.fi
.
.IP "" 0
.
.SS "Loading from CSV available through HTTP"
The same command as just above can also be run if the CSV file happens to be found on a remote HTTP location:
.
Expand Down Expand Up @@ -267,6 +299,32 @@ create table districts_longlat
.P
Also notice that the same command will work against an archived version of the same data, e\.g\. http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt\.gz\.
.
.P
Finally, it\'s important to note that pgloader first fetches the content from the HTTP URL it to a local file, then expand the archive when it\'s recognized to be one, and only then processes the locally expanded file\.
.
.P
In some cases, either because pgloader has no direct support for your archive format or maybe because expanding the archive is not feasible in your environment, you might want to \fIstream\fR the content straight from its remote location into PostgreSQL\. Here\'s how to do that, using the old battle tested Unix Pipes trick:
.
.IP "" 4
.
.nf

curl http://pgsql\.tapoueh\.org/temp/2013_Gaz_113CDs_national\.txt\.gz \e
| gunzip \-c \e
| pgloader \-\-type csv \e
\-\-field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
\-\-with "skip header = 1" \e
\-\-with "fields terminated by \'\et\'" \e
\- \e
postgresql:///pgloader?districts_longlat
.
.fi
.
.IP "" 0
.
.P
Now the OS will take care of the streaming and buffering between the network and the commands and pgloader will take care of streaming the data down to PostgreSQL\.
.
.SS "Migrating from SQLite"
The following command will open the SQLite database, discover its tables definitions including indexes and foreign keys, migrate those definitions while \fIcasting\fR the data type specifications to their PostgreSQL equivalent and then migrate the data over:
.
Expand Down
42 changes: 42 additions & 0 deletions pgloader.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,25 @@ For documentation about the available syntaxes for the `--field` and

Note also that the PostgreSQL URI includes the target *tablename*.

### Reading from STDIN

File based pgloader sources can be loaded from the standard input, as in the
following example:

pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
--with "skip header = 1" \
--with "fields terminated by '\t'" \
- \
postgresql:///pgloader?districts_longlat \
< test/data/2013_Gaz_113CDs_national.txt

The dash (`-`) character as a source is used to mean *standard input*, as
usual in Unix command lines. It's possible to stream compressed content to
pgloader with this technique, using the Unix pipe:

gunzip -c source.gz | pgloader --type csv ... - pgsql:///target?foo

### Loading from CSV available through HTTP

The same command as just above can also be run if the CSV file happens to be
Expand Down Expand Up @@ -222,6 +241,29 @@ Also notice that the same command will work against an archived version of
the same data, e.g.
http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz.

Finally, it's important to note that pgloader first fetches the content from
the HTTP URL it to a local file, then expand the archive when it's
recognized to be one, and only then processes the locally expanded file.

In some cases, either because pgloader has no direct support for your
archive format or maybe because expanding the archive is not feasible in
your environment, you might want to *stream* the content straight from its
remote location into PostgreSQL. Here's how to do that, using the old battle
tested Unix Pipes trick:

curl http://pgsql.tapoueh.org/temp/2013_Gaz_113CDs_national.txt.gz \
| gunzip -c \
| pgloader --type csv \
--field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong"
--with "skip header = 1" \
--with "fields terminated by '\t'" \
- \
postgresql:///pgloader?districts_longlat

Now the OS will take care of the streaming and buffering between the network
and the commands and pgloader will take care of streaming the data down to
PostgreSQL.

### Migrating from SQLite

The following command will open the SQLite database, discover its tables
Expand Down
18 changes: 10 additions & 8 deletions src/parsers/command-parser.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -189,13 +189,14 @@
(uiop:native-namestring filename))
(declare (ignore abs paths no-path-p))
(let ((dotted-parts (reverse (sq:split-sequence #\. filename))))
(destructuring-bind (extension name-or-ext &rest parts)
dotted-parts
(declare (ignore parts))
(if (string-equal "tar" name-or-ext) :archive
(loop :for (type . extensions) :in *data-source-filename-extensions*
:when (member extension extensions :test #'string-equal)
:return type))))))
(when (<= 2 (length dotted-parts))
(destructuring-bind (extension name-or-ext &rest parts)
dotted-parts
(declare (ignore parts))
(if (string-equal "tar" name-or-ext) :archive
(loop :for (type . extensions) :in *data-source-filename-extensions*
:when (member extension extensions :test #'string-equal)
:return type)))))))

(defvar *parse-rule-for-source-types*
'(:csv csv-file-source
Expand Down Expand Up @@ -234,7 +235,8 @@
(:filename (parse-filename-for-source-type url))
(:http (parse-filename-for-source-type
(puri:uri-path (puri:parse-uri url)))))))
(parse-source-string-for-type type source-string)))))))
(when type
(parse-source-string-for-type type source-string))))))))

(defun parse-target-string (target-string)
(parse 'pgsql-uri target-string))
Expand Down
3 changes: 2 additions & 1 deletion src/parsers/command-source.lisp
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
(or (member char #.(quote (coerce "/\\:.-_!@#$%^&*()" 'list)))
(alphanumericp char)))

(defrule stdin (~ "stdin") (:constant (list :stdin nil)))
(defrule stdin (or "-" (~ "stdin")) (:constant (list :stdin nil)))

(defrule inline (~ "inline")
(:lambda (i)
(declare (ignore i))
Expand Down

0 comments on commit 6d76bc5

Please sign in to comment.