Skip to content

Commit

Permalink
Finish up logseq.graph-parser
Browse files Browse the repository at this point in the history
- Parser now parses all graph files like the app does, not just pages and journals.
  This required extracting another fn from repo-handler
- Add and tweak CI steps that are specific to graph-parser. All
  namespaces in this library are checked for nbb compatibility
- Cleaned up parser cli API so only one fn is needed for scripts
- Tests were updated to match new parsing behavior
- large_vars.clj can run with a smaller max-line-count after only refactoring two fns
- Add docs
  • Loading branch information
logseq-cldwalker committed May 27, 2022
1 parent 1e29905 commit b142327
Show file tree
Hide file tree
Showing 30 changed files with 338 additions and 217 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
# In this job because it depends on an npm package
- name: Load nbb compatible namespaces
run: bb test:load-nbb-compatible-namespaces
run: bb test:load-namespaces-with-nbb

lint:
runs-on: ubuntu-latest
Expand Down
25 changes: 13 additions & 12 deletions .github/workflows/graph-parser.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: logseq graph-parser CI

on:
# Path filters ensure jobs only kick off if a change is made to graph-parser
push:
branches: [master]
paths:
Expand Down Expand Up @@ -47,10 +48,10 @@ jobs:
with:
cli: ${{ env.CLOJURE_VERSION }}

# - name: Setup Babashka
# uses: turtlequeue/[email protected]
# with:
# babashka-version: ${{ env.BABASHKA_VERSION }}
- name: Setup Babashka
uses: turtlequeue/[email protected]
with:
babashka-version: ${{ env.BABASHKA_VERSION }}

- name: Clojure cache
uses: actions/cache@v2
Expand All @@ -64,20 +65,20 @@ jobs:

- name: Fetch Clojure deps
if: steps.clojure-deps.outputs.cache-hit != 'true'
run: clojure -A:test -P
run: cd deps/graph-parser && clojure -A:test -P

- name: Fetch yarn deps
run: cd deps/graph-parser && yarn install --frozen-lockfile

- name: Run ClojureScript tests
run: clojure -M:test
run: cd deps/graph-parser && clojure -M:test

- name: Run nbb-logseq tests
run: cd deps/graph-parser && yarn nbb-logseq -cp src:test -m logseq.graph-parser.nbb-test-runner/run-tests

# # In this job because it depends on an npm package
# - name: Load nbb compatible namespaces
# run: bb test:load-nbb-compatible-namespaces
# In this job because it depends on an npm package
- name: Load namespaces into nbb-logseq
run: bb test:load-all-namespaces-with-nbb deps/graph-parser src

lint:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -105,8 +106,8 @@ jobs:
- name: Run clj-kondo lint
run: cd deps/graph-parser && clojure -M:clj-kondo --parallel --lint src test

- name: Lint for vars that are too large
run: scripts/large_vars.clj deps/graph-parser/src

- name: Carve lint for unused vars
run: cd deps/graph-parser && ../../scripts/carve.clj

- name: Lint for vars that are too large
run: scripts/large_vars.clj deps/graph-parser/src '{:max-lines-count 75}'
6 changes: 5 additions & 1 deletion CODEBASE_OVERVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,11 @@ After cloning the [Logseq repository](https://github.com/logseq/logseq), there a

- `src/main/frontend/` contains code that powers the Logseq editor. Folders and files inside are organized by features or functions. For example, `components` contains all the UI components and `handler` contains all the event-handling code. You can explore on your own interest.

- `src/main/logseq/` contains the api used by plugins and the graph-parser.
- `src/main/logseq/` contains the api used by plugins.

- `deps/` contains dependencies or libraries used by the frontend.

- `deps/graph-parser/` is a library that parses a Logseq graph and saves it to a database.

## Data Flow

Expand Down
5 changes: 4 additions & 1 deletion bb.edn
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,12 @@
dev:lint
logseq.tasks.dev/lint

test:load-nbb-compatible-namespaces
test:load-namespaces-with-nbb
logseq.tasks.nbb/load-compatible-namespaces

test:load-all-namespaces-with-nbb
logseq.tasks.nbb/load-all-namespaces

lang:list
logseq.tasks.lang/list-langs

Expand Down
2 changes: 1 addition & 1 deletion deps/graph-parser/.carve/ignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
;; For CLI
logseq.graph-parser.cli/parse
logseq.graph-parser.cli/parse-graph
;; For CLI
logseq.graph-parser.db/start-conn
;; For CLI
Expand Down
1 change: 0 additions & 1 deletion deps/graph-parser/.clj-kondo/config.edn
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,4 @@
logseq.graph-parser.property gp-property
logseq.graph-parser.config gp-config
logseq.graph-parser.date-time-util date-time-util}}}
:lint-as {promesa.core/let clojure.core/let}
:skip-comments true}
63 changes: 63 additions & 0 deletions deps/graph-parser/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Description

This library parses a logseq graph directory and returns it as a datascript
database connection. This library powers the Logseq app and also runs from the
commandline, _independent_ of the app. This is powerful as this can run anywhere
that a Node.js script has access to a Logseq graph e.g. on CI processes like
Github Actions. This library is compatible with ClojureScript and with
[nbb-logseq](https://github.com/logseq/nbb-logseq) to respectively provide
frontend and commandline functionality.

## API

This library is under the parent namespace `logseq.graph-parser`. This library
provides two main namespaces for parsing, `logseq.graph-parser` and
`logseq.graph-parser.cli`. `logseq.graph-parser/parse-file` is the main fn for
the frontend. `logseq.graph-parser.cli/parse-graph` is the main fn for node.js
CLIs.

## Usage

See `logseq.graph-parser.cli-test` for now. A real world example is coming soon.

## Dev

This follows the practices that [the Logseq frontend
follows](/docs/dev-practices.md). Most of the same linters are used, with
configurations that are specific to this library. See [this library's CI
file](/.github/workflows/graph-parser.yml) for linting examples.

### Setup

To run linters and tests, you'll want to install yarn dependencies once:
```
yarn install
```

This step is not needed if you're just running the application.

### Testing

Since this file is compatible with cljs and nbb-logseq, tests are run against both languages.

ClojureScript tests use https://github.com/Olical/cljs-test-runner. To run tests:
```
clojure -M:test
```

To see available options that can run specific tests or namespaces: `clojure -M:test --help`

To run nbb-logseq tests:
```
yarn nbb-logseq -cp src:test -m logseq.graph-parser.nbb-test-runner/run-tests
```

### Managing dependencies

The package.json dependencies are just for testing and should be updated if there is
new behavior to test.

The deps.edn dependecies are used by both ClojureScript and nbb-logseq. Their
versions should be backwards compatible with each other with priority given to
the frontend. _No new dependency_ should be introduced to this library without
an understanding of the tradeoffs of adding this to nbb-logseq.
6 changes: 3 additions & 3 deletions deps/graph-parser/deps.edn
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
cljs-bean/cljs-bean {:mvn/version "1.5.0"}}

:aliases
;; This runs tests with nodejs. Would be nice to run this with a headless env since
;; this is how its normally run in the app but this requires more setup with
;; karma and shadow-cljs.edn
;; This runs tests with nodejs. Would be nice to run this with in a browser env
;; since this is how its normally run in the app but this requires more setup
;; with karma, shadow-cljs.edn and headless mode on CI
{:test {:extra-paths ["test"]
:extra-deps {olical/cljs-test-runner {:mvn/version "3.8.0"}
org.clojure/clojurescript {:mvn/version "1.11.54"}}
Expand Down
49 changes: 28 additions & 21 deletions deps/graph-parser/src/logseq/graph_parser.cljs
Original file line number Diff line number Diff line change
@@ -1,38 +1,37 @@
(ns ^:nbb-compatible logseq.graph-parser
"Main ns for parsing graph from source files"
(ns logseq.graph-parser
"Main ns used by logseq app to parse graph from source files"
(:require [datascript.core :as d]
[logseq.graph-parser.extract :as extract]
[logseq.graph-parser.util :as gp-util]
[logseq.graph-parser.date-time-util :as date-time-util]
[logseq.graph-parser.config :as gp-config]
[clojure.string :as string]
[clojure.set :as set]))

(defn- db-set-file-content!
"Modified copy of frontend.db.model/db-set-file-content!"
[db path content]
[conn path content]
(let [tx-data {:file/path path
:file/content content}]
(d/transact! db [tx-data] {:skip-refresh? true})))
(d/transact! conn [tx-data] {:skip-refresh? true})))

(defn parse-file
"Parse file and save parsed data to the given db"
[db file content {:keys [new? delete-blocks-fn new-graph? extract-options]
"Parse file and save parsed data to the given db. Main parse fn used by logseq app"
[conn file content {:keys [new? delete-blocks-fn new-graph? extract-options]
:or {new? true
new-graph? false
delete-blocks-fn (constantly [])}}]
(db-set-file-content! db file content)
(db-set-file-content! conn file content)
(let [format (gp-util/get-format file)
file-content [{:file/path file}]
tx (if (contains? gp-config/mldoc-support-formats format)
(let [extract-options' (merge {:block-pattern (gp-config/get-block-pattern format)
:date-formatter "MMM do, yyyy"
:supported-formats (gp-config/supported-formats)}
extract-options)
extract-options
{:db @conn})
[pages blocks]
(extract/extract-blocks-pages
file
content
(merge extract-options' {:db @db}))
(extract/extract-blocks-pages file content extract-options')
delete-blocks (delete-blocks-fn (first pages) file)
block-ids (map (fn [block] {:block/uuid (:block/uuid block)}) blocks)
block-refs-ids (->> (mapcat :block/refs blocks)
Expand All @@ -51,13 +50,21 @@
new?
;; TODO: use file system timestamp?
(assoc :file/created-at (date-time-util/time-ms)))])]
(d/transact! db (gp-util/remove-nils tx) (when new-graph? {:new-graph? true}))))
(d/transact! conn (gp-util/remove-nils tx) (when new-graph? {:new-graph? true}))))

(defn parse
"Main parse fn"
([db files]
(parse db files {}))
([db files {:keys [config]}]
(let [extract-options {:date-formatter (gp-config/get-date-formatter config)}]
(doseq [{:file/keys [path content]} files]
(parse-file db path content {:extract-options extract-options})))))
(defn filter-files
"Filters files in preparation for parsing. Only includes files that are
supported by parser"
[files]
(let [support-files (filter
(fn [file]
(let [format (gp-util/get-format (:file/path file))]
(contains? (set/union #{:edn :css} gp-config/mldoc-support-formats) format)))
files)
support-files (sort-by :file/path support-files)
{journals true non-journals false} (group-by (fn [file] (string/includes? (:file/path file) "journals/")) support-files)
{built-in true others false} (group-by (fn [file]
(or (string/includes? (:file/path file) "contents.")
(string/includes? (:file/path file) ".edn")
(string/includes? (:file/path file) "custom.css"))) non-journals)]
(concat (reverse journals) built-in others)))
2 changes: 1 addition & 1 deletion deps/graph-parser/src/logseq/graph_parser/block.cljc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(ns ^:nbb-compatible logseq.graph-parser.block
(ns logseq.graph-parser.block
;; Disable clj linters since we don't support clj
#?(:clj {:clj-kondo/config {:linters {:unresolved-namespace {:level :off}
:unresolved-symbol {:level :off}}}})
Expand Down
68 changes: 57 additions & 11 deletions deps/graph-parser/src/logseq/graph_parser/cli.cljs
Original file line number Diff line number Diff line change
@@ -1,20 +1,66 @@
(ns logseq.graph-parser.cli
"Ns only for use by CLIs as it uses node.js libraries"
"Primary ns to parse graphs with node.js based CLIs"
(:require ["fs" :as fs]
["child_process" :as child-process]
[clojure.edn :as edn]
[logseq.graph-parser :as graph-parser]))
[clojure.string :as string]
[logseq.graph-parser :as graph-parser]
[logseq.graph-parser.config :as gp-config]
[logseq.graph-parser.db :as gp-db]))

(defn- slurp
"Like clojure.core/slurp"
[file]
(str (fs/readFileSync file)))

(defn- sh
"Run shell cmd synchronously and print to inherited streams by default. Aims
to be similar to babashka.tasks/shell
TODO: Fail fast when process exits 1"
[cmd opts]
(child-process/spawnSync (first cmd)
(clj->js (rest cmd))
(clj->js (merge {:stdio "inherit"} opts))))

(defn build-graph-files
"Given a git graph directory, returns allowed file paths and their contents in
preparation for parsing"
[dir]
(let [files (->> (str (.-stdout (sh ["git" "ls-files"]
{:cwd dir :stdio nil})))
string/split-lines
(map #(hash-map :file/path (str dir "/" %)))
graph-parser/filter-files)]
(mapv #(assoc % :file/content (slurp (:file/path %))) files)))

(defn- read-config
"Commandline version of frontend.handler.common/read-config without graceful
handling of broken config. Config is assumed to be at $dir/logseq/config.edn "
[dir]
(if (fs/existsSync (str dir "/logseq/config.edn"))
(-> (str dir "/logseq/config.edn") fs/readFileSync str edn/read-string)
{}))
(let [config-file (str dir "/" gp-config/app-name "/config.edn")]
(if (fs/existsSync config-file)
(-> config-file fs/readFileSync str edn/read-string)
{})))

(defn- parse-files
[conn files {:keys [config] :as options}]
(let [extract-options (merge {:date-formatter (gp-config/get-date-formatter config)}
(select-keys options [:verbose]))]
(doseq [{:file/keys [path content]} files]
(graph-parser/parse-file conn path content {:extract-options extract-options}))))

(defn parse
"Main entry point for parsing"
[dir db files]
(graph-parser/parse db
files
{:config (read-config dir)}))
(defn parse-graph
"Parses a given graph directory and returns a datascript connection and all
files that were processed. The directory is parsed as if it were a new graph
as it can't assume that the metadata in logseq/ is up to date. Directory is
assumed to be using git"
([dir]
(parse-graph dir {}))
([dir options]
(let [files (build-graph-files dir)
conn (gp-db/start-conn)
config (read-config dir)]
(println "Parsing" (count files) "files...")
(parse-files conn files (merge options {:config config}))
{:conn conn
:files (map :file/path files)})))
6 changes: 5 additions & 1 deletion deps/graph-parser/src/logseq/graph_parser/config.cljs
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
(ns ^:nbb-compatible logseq.graph-parser.config
(ns logseq.graph-parser.config
"Config that is shared between graph-parser and rest of app"
(:require [logseq.graph-parser.util :as gp-util]
[clojure.set :as set]
[clojure.string :as string]))

(def app-name
"Copy of frontend.config/app-name. Too small to couple to main app"
"logseq")

(defonce local-assets-dir "assets")

(defn local-asset?
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(ns ^:nbb-compatible logseq.graph-parser.date-time-util
(ns logseq.graph-parser.date-time-util
"cljs-time util fns for graph-parser"
(:require [cljs-time.coerce :as tc]
[cljs-time.core :as t]
Expand Down
2 changes: 1 addition & 1 deletion deps/graph-parser/src/logseq/graph_parser/db/default.cljs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(ns ^:nbb-compatible logseq.graph-parser.db.default
(ns logseq.graph-parser.db.default
(:require [clojure.string :as string]))

(defonce built-in-pages-names
Expand Down
2 changes: 1 addition & 1 deletion deps/graph-parser/src/logseq/graph_parser/db/schema.cljs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(ns ^:nbb-compatible logseq.graph-parser.db.schema)
(ns logseq.graph-parser.db.schema)

(defonce version 1)
(defonce ast-version 1)
Expand Down
Loading

0 comments on commit b142327

Please sign in to comment.