Skip to content

Commit

Permalink
Fix dead external links as well, and unify installation pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Mytherin committed Jun 11, 2022
1 parent 916e066 commit 71a9d95
Show file tree
Hide file tree
Showing 14 changed files with 291 additions and 2,058 deletions.
1 change: 1 addition & 0 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ baseurl: "" # the subpath of your site, e.g. /blog
url: "https://duckdb.org" # the base hostname & protocol for your site, e.g. http://example.com
# Set current Version of duckDB
currentduckdbversion: 0.3.4
currentjavaversion: 0.3.3
livereload: true
highlighter: rouge
incremental: false
Expand Down
263 changes: 263 additions & 0 deletions _includes/installation.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@

<div class="yourselection">

<div class="version select">
<h3>Version</h3>
<ul class="version">
<li class="selected" data-id=".latest">
<div> {{ site.currentduckdbversion }} <span class="versioninfo">(Latest Release)</span></div>
</li>
<li data-id=".master">
<div>GitHub master <span class="versioninfo">(Bleeding Edge)</span></div>
</li>
</ul>
</div>

<div class="evironment select">
<h3>Environment</h3>
<ul class="environment">
<li class="selected" data-id=".python">Python</li>
<li data-id=".r">R</li>
<li data-id=".java">Java</li>
<li data-id=".js">node.js</li>
<li data-id=".cplusplus">C/C++</li>
<li data-id=".cli">CLI</li>
<li data-id=".odbc">ODBC</li>
</ul>
</div>


<div class="installer select inactive">
<h3>Package</h3>
<ul class="pack">
<li data-id=".source">Source</li>
<li data-id=".binary">Binary</li>
</ul>
</div>

<div class="platform select inactive">
<h3>Platform</h3>
<ul class="platform">
<li data-id=".win">Windows</li>
<li data-id=".macos">macOS</li>
<li data-id=".linux">Linux</li>
</ul>
</div>


<div class="installartion output">
<h3>Installation</h3>
<div class="result" id="resultselection">
{% highlight bash %}pip install duckdb=={{ site.currentduckdbversion }}{% endhighlight %}
</div>


</div>

<div class="example output">
<h3>Usage Example</h3>
<div class="result">
{% highlight python %}
import duckdb
cursor = duckdb.connect()
print(cursor.execute('SELECT 42').fetchall()){% endhighlight %}
</div>
</div>

</div>


<div class="possibleresults">

<!-- Empty Results -->
<div class="latest master cplusplus"></div>
<div class="latest master cli"></div>
<div class="latest master odbc"></div>
<div class="latest master cli binary"></div>
<div class="master odbc"></div>
<!-- End Empty Results -->


<div class="latest python">
{% highlight bash %}pip install duckdb=={{ site.currentduckdbversion }}{% endhighlight %}
</div>

<div class="latest r">
{% highlight bash %}install.packages("duckdb"){% endhighlight %}
</div>

<div class="latest java">
{% highlight xml %}
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<version>{{ site.currentjavaversion }}</version>
</dependency>{% endhighlight %}
</div>

<div class="latest js">
{% highlight bash %}npm install duckdb{% endhighlight %}
</div>

<div class="latest cplusplus source">
<a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-src.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-src.zip</a>
</div>

<div class="latest cli source">
coming soon
</div>

<div class="latest odbc source">
</div>

<div class="latest cplusplus binary macos">
<a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-osx-universal.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-osx-universal.zip</a>
</div>

<div class="latest cplusplus binary linux">
Linux 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-amd64.zip</a><br/><br/>
Linux 32-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-i386.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-i386.zip</a><br/><br/>
Linux Raspberry Pi: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-rpi.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-linux-rpi.zip</a>
</div>

<div class="latest cplusplus binary win">
Win 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-windows-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-windows-amd64.zip</a><br/><br/>
Win 32-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-windows-i386.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/libduckdb-windows-i386.zip</a>
</div>

<div class="latest cli binary macos">
<a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-osx-universal.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-osx-universal.zip</a>
</div>

<div class="latest cli binary linux">
Linux 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-amd64.zip</a><br/><br/>
Linux 32-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-i386.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-i386.zip</a><br/><br/>
Linux Raspberry Pi: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-rpi.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-linux-rpi.zip</a>
</div>

<div class="latest cli binary win">
Win 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-windows-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-windows-amd64.zip</a><br/><br/>
Win 32-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-windows-i386.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_cli-windows-i386.zip</a>
</div>

<div class="latest odbc binary linux">
Linux 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_odbc-linux-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_odbc-linux-amd64.zip</a><br/><br/>
{% highlight bash %}sudo apt-get install unixodbc unixodbc-dev{% endhighlight %}
{% highlight bash %}./unixodbc_setup.sh --help {% endhighlight %}
</div>

<div class="latest odbc binary macos">
</div>

<div class="latest odbc binary win">
Win 64-bit: <a href="https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_odbc-windows-amd64.zip" target="_blank">https://github.com/duckdb/duckdb/releases/download/v{{ site.currentduckdbversion }}/duckdb_odbc-windows-amd64.zip</a><br/><br/>
./odbc_install.exe <span style="opacity: 0.5;"><font color="black">(double-click)</font>
</div>

<div class="master python">
{% highlight bash %}pip install duckdb --pre --upgrade{% endhighlight %}
</div>

<div class="master r">
</div>

<div class="master java">
<p>MacOS Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AOSX" target="_blank">the "OSX" CI runs</a></p>

<p>Linux Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3ALinuxRelease" target="_blank">the "LinuxRelease" CI runs</a></p>

<p>Windows Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AWindows" target="_blank">the "Windows" CI runs</a></p>
</div>


<div class="master js">
{% highlight bash %}npm install duckdb@next{% endhighlight %}
</div>


<div class="master cplusplus source">
<a href="https://github.com/duckdb/duckdb" target="_blank">https://github.com/duckdb/duckdb</a>
</div>

<div class="master odbc">
<p>MacOS Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AOSX" target="_blank">the "OSX" CI runs</a></p>

<p>Linux Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3ALinuxRelease" target="_blank">the "LinuxRelease" CI runs</a></p>

<p>Windows Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AWindows" target="_blank">the "Windows" CI runs</a></p>
</div>

<div class="master cli binary macos">
MacOS Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AOSX" target="_blank">the "OSX" CI runs</a>
</div>

<div class="master cli binary linux">
Linux Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3ALinuxRelease" target="_blank">the "LinuxRelease" CI runs</a>
</div>

<div class="master cli binary win">
Windows Build Artifacts are available from <a href="https://github.com/duckdb/duckdb/actions?query=branch%3Amaster+event%3Apush+workflow%3AWindows" target="_blank">the "Windows" CI runs</a>
</div>



<div class="example python">
{% highlight python %}
import duckdb
cursor = duckdb.connect()
print(cursor.execute('SELECT 42').fetchall()){% endhighlight %}
</div>

<div class="example r">
{% highlight R %}
library("DBI")
con = dbConnect(duckdb::duckdb(), ":memory:")
dbWriteTable(con, "iris", iris)
dbGetQuery(con, 'SELECT "Species", MIN("Sepal.Width") FROM iris GROUP BY "Species"'){% endhighlight %}
</div>

<div class="example java">
{% highlight java %}
Class.forName("org.duckdb.DuckDBDriver");
Connection conn = DriverManager.getConnection("jdbc:duckdb:");
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT 42");{% endhighlight %}
</div>

<div class="example js">
{% highlight javascript %}
var duckdb = require('duckdb');
var db = new duckdb.Database(':memory:'); // or a file name for a persistent DB
db.all('SELECT 42 AS fortytwo', function(err, res) {
if (err) {
throw err;
}
console.log(res[0].fortytwo)
});{% endhighlight %}
</div>

<div class="example cplusplus">
{% highlight cpp %}
DuckDB db(nullptr);
Connection con(db);
auto result = con.Query("SELECT 42");
result->Print();{% endhighlight %}
</div>

<div class="example cli">
{% highlight bash %}
./duckdb{% endhighlight %}
</div>

<div class="example odbc">
</div>

</div>

<!--
<h1>Embedding</h1>
<p> As DuckDB is an embedded database, there is no database server to launch or client to connect to a running server. However, the database server can be embedded directly into an application using the C or C++ bindings. The main build process creates the shared library build/release/src/libduckdb.[so|dylib|dll] that can be linked against. A static library is built as well.
For examples on how to embed DuckDB into your application, see the examples folder in the repository.</p>
-->
4 changes: 2 additions & 2 deletions _posts/2021-01-25-full-text-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Searching through textual data stored in a database can be cumbersome, as SQL do

We expect a search engine to return us results within milliseconds. For a long time databases were unsuitable for this task, because they could not search large inverted indices at this speed: transactional database systems are not made for this use case. However, analytical database systems, can keep up with state-of-the art information retrieval systems. The company [Spinque](https://www.spinque.com/) is a good example of this. At Spinque, MonetDB is used as a computation engine for customized search engines.

DuckDB's FTS implementation follows the paper "[Old Dogs Are Great at New Tricks](https://hannes.muehleisen.org/SIGIR2014-column-stores-ir-prototyping.pdf)". A keen observation there is that advances made to the database system, such as parallelization, will speed up your search engine "for free"!
DuckDB's FTS implementation follows the paper "[Old Dogs Are Great at New Tricks](https://www.duckdb.org/pdf/SIGIR2014-column-stores-ir-prototyping.pdf)". A keen observation there is that advances made to the database system, such as parallelization, will speed up your search engine "for free"!

Alright, enough about the "why", let's get to the "how".

Expand Down Expand Up @@ -145,7 +145,7 @@ map all 0.2324
P_30 all 0.2948
```

Not bad! While these results are not as high as the reproducible by [Anserini](https://github.com/castorini/anserini/blob/master/docs/regressions-robust04.md), they are definitely acceptable. The difference in performance can be explained by differences in
Not bad! While these results are not as high as the reproducible by [Anserini](https://github.com/castorini/anserini), they are definitely acceptable. The difference in performance can be explained by differences in
1. Which stemmer was used (we used 'porter')
2. Which stopwords were used (we used the list of 571 English stopwords used in the SMART system)
3. Pre-processing (removal of accents, punctuation, numbers)
Expand Down
2 changes: 1 addition & 1 deletion _posts/2021-05-14-sql-on-pandas.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ Using DuckDB, you can take advantage of the powerful and expressive SQL language

Traditional SQL engines use the Client-Server paradigm, which means that a client program connects through a socket to a server. Queries are run on the server, and results are sent back down to the client afterwards. This is the same when using for example Postgres from Python. Unfortunately, this transfer [is a serious bottleneck](http://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf). In-process engines such as SQLite or DuckDB do not run into this problem.

To showcase how costly this data transfer over a socket is, we have run a benchmark involving Postgres, SQLite and DuckDB. The source code for the benchmark can be found [here](https://gist.github.com/hannesmuehleisen/a95a39a1eda63aeb0ca13fd82d1ba49c).
To showcase how costly this data transfer over a socket is, we have run a benchmark involving Postgres, SQLite and DuckDB. The source code for the benchmark can be found [here](https://gist.github.com/hannes/a95a39a1eda63aeb0ca13fd82d1ba49c).

In this benchmark we copy a (fairly small) Pandas data frame consisting of 10M 4-Byte integers (40MB) from Python to the PostgreSQL, SQLite and DuckDB databases. Since the default Pandas `to_sql` was rather slow, we added a separate optimization in which we tell Pandas to write the data frame to a temporary CSV file, and then tell PostgreSQL to directly copy data from that file into a newly created table. This of course will only work if the database server is running on the same machine as Python.

Expand Down
6 changes: 3 additions & 3 deletions _posts/2021-12-03-duck-arrow.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---

layout: post
layout: post
title: "DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB"
author: Pedro Holanda and Jonathan Keane
excerpt_separator: <!--more-->
Expand Down Expand Up @@ -211,9 +211,9 @@ The preceding R code shows in low-level detail how the data is streaming. We pro
Here we demonstrate in a simple benchmark the performance difference between querying Arrow datasets with DuckDB and querying Arrow datasets with Pandas.
For both the Projection and Filter pushdown comparison, we will use Arrow tables. That is due to Pandas not being capable of consuming Arrow stream objects.

For the NYC Taxi benchmarks, we used the [scilens diamonds configuration](https://www.monetdb.org/wiki/Scilens-configuration-standard) and for the TPC-H benchmarks, we used an m1 MacBook Pro. In both cases, parallelism in DuckDB was used (which is now on by default).
For the NYC Taxi benchmarks, we used the scilens diamonds configuration and for the TPC-H benchmarks, we used an m1 MacBook Pro. In both cases, parallelism in DuckDB was used (which is now on by default).

For the comparison with Pandas, note that DuckDB runs in parallel, while pandas only support single-threaded execution. Besides that, one should note that we are comparing automatic optimizations. DuckDB's query optimizer can automatically push down filters and projections. This automatic optimization is not supported in pandas, but it is possible for users to manually perform some of these predicate and filter pushdowns by manually specifying them them in the `read_parquet()` call.
For the comparison with Pandas, note that DuckDB runs in parallel, while pandas only support single-threaded execution. Besides that, one should note that we are comparing automatic optimizations. DuckDB's query optimizer can automatically push down filters and projections. This automatic optimization is not supported in pandas, but it is possible for users to manually perform some of these predicate and filter pushdowns by manually specifying them them in the `read_parquet()` call.

### Projection Pushdown

Expand Down
Loading

0 comments on commit 71a9d95

Please sign in to comment.