Skip to content

Commit

Permalink
Feat: new file name escaping rules (logseq#6134)
Browse files Browse the repository at this point in the history
* feat: new file name escaping for namespace

feat: new file name decoding back to page name

* test: file name sanitization

feat: use _0x to encode %

feat: don't create title property

test: extra URL encoding for escaped file names

fix: fit pdf prefix into new file name rules

fix: encoding rules on some characters

fix: handle the buggy file names imported by users

fix: pdf block ref failed to jump

fix: logseq#6167

* fix: enhance backward compatibility

chore: title validation

test: fix namespace queries test

chore: use index version stored in config.edn instead of search.versions

* feat: convert old version graph mechanism

ui: file conversion UI

feat: rename files for conversion

feat: don't trigger conversion when title property is manually edited

fix: file conflict notification while renaming files on some OS

feat: re-index on update version

feat: clicking NO in the re-index dialog would update the index-ver flag to suppress the dialog

feat: use html entities for reserved char escaping

dev: remove unresolved vars & minor refactor

chore: move file name sanity from gp-util to fs-util, as it's for encoding only but not parsing

test: update file name tests to html entities rule

test: convert files from dir ver 3 for repo_tests

feat: convert Windows reserved file names

fix: save index version into idb instead of file

fix: decode uri of path while parsing files on mobile

fix: couple dir version and index version to ensure only re-index on converted dirs

feat: go back to url-encode for special chars

* chore: fix lint

chore: improve codebase to address Gabriel's comments

fix: remove file remnants on add conflict

fix: remove file remnants on rename conflict

chore: add test ns to nbb runner

Also fix typoed fn and remove unused code

* fix: issues of rebase PR6134 to master after file-sync merged

* feat: switchable filename format

* fix: use  go block to replace promesa for rename all with blocking

* feat: re-index after apply rename all

* ui: file conversion enhancement

* fix: merging filename format PR with master

* fix: filename format lint & CI

* ui: filename format flow

* fix: error handling on the rare internal file path confliction case

* chore: shorten component code for files-breaking-changed

* chore: fix CI

* Minor fixes per latest code review

- Remove unused page-name-order
- Update catch usage to be consistent with what's on master
- Place state fn in right place
- Wording fixes:
  - select and apply -> manual. There are no checkboxes for the user
  - Update -> Edit. We use edit for all other settings button
  - Alternatives to starting sentences with May. Not a common way to
    start a sentence
  - update outdated template comment

* ux: rename instruction update

* ux: rename instruction update (2)

* Tweak wording of conversion modal

Simplifed first paragraph and explained the page to the user in first
sentence, may isn't a common way to start sentences and updated outdated
wording

* Fix large-var warning by splitting out a piece of component

* fix: right slash on Windows; legacy format file sanitization

* fix: triple lowbar renaming fns

Co-authored-by: Gabriel Horner <[email protected]>
  • Loading branch information
cnrpman and logseq-cldwalker authored Oct 8, 2022
1 parent 6007d60 commit 0e4037a
Show file tree
Hide file tree
Showing 49 changed files with 1,182 additions and 331 deletions.
2 changes: 1 addition & 1 deletion deps/db/src/logseq/db/rules.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"Datalog rules for use with logseq.db.schema")

(def ^:large-vars/data-var rules
;; rule "parent" is optimized for child node -> parent node nesting queries
;; rule "parent" is optimized for parent node -> child node nesting queries
'[[(parent ?p ?c)
[?c :block/parent ?p]]
[(parent ?p ?c)
Expand Down
50 changes: 26 additions & 24 deletions deps/graph-parser/src/logseq/graph_parser.cljs
Original file line number Diff line number Diff line change
Expand Up @@ -20,32 +20,34 @@
file-content [{:file/path file}]
{:keys [tx ast]}
(if (contains? gp-config/mldoc-support-formats format)
(let [extract-options' (merge {:block-pattern (gp-config/get-block-pattern format)
:date-formatter "MMM do, yyyy"
:supported-formats (gp-config/supported-formats)}
extract-options
{:db @conn})
{:keys [pages blocks ast]}
(extract/extract file content extract-options')
delete-blocks (delete-blocks-fn (first pages) file)
block-ids (map (fn [block] {:block/uuid (:block/uuid block)}) blocks)
block-refs-ids (->> (mapcat :block/refs blocks)
(filter (fn [ref] (and (vector? ref)
(= :block/uuid (first ref)))))
(map (fn [ref] {:block/uuid (second ref)}))
(seq))
;; To prevent "unique constraint" on datascript
block-ids (set/union (set block-ids) (set block-refs-ids))
pages (extract/with-ref-pages pages blocks)
pages-index (map #(select-keys % [:block/name]) pages)]
;; does order matter?
{:tx (concat file-content pages-index delete-blocks pages block-ids blocks)
:ast ast})
{:tx file-content})
(let [extract-options' (merge {:block-pattern (gp-config/get-block-pattern format)
:date-formatter "MMM do, yyyy"
:supported-formats (gp-config/supported-formats)
:uri-encoded? false
:filename-format :legacy}
extract-options
{:db @conn})
{:keys [pages blocks ast]}
(extract/extract file content extract-options')
delete-blocks (delete-blocks-fn (first pages) file)
block-ids (map (fn [block] {:block/uuid (:block/uuid block)}) blocks)
block-refs-ids (->> (mapcat :block/refs blocks)
(filter (fn [ref] (and (vector? ref)
(= :block/uuid (first ref)))))
(map (fn [ref] {:block/uuid (second ref)}))
(seq))
;; To prevent "unique constraint" on datascript
block-ids (set/union (set block-ids) (set block-refs-ids))
pages (extract/with-ref-pages pages blocks)
pages-index (map #(select-keys % [:block/name]) pages)]
;; does order matter?
{:tx (concat file-content pages-index delete-blocks pages block-ids blocks)
:ast ast})
{:tx file-content})
tx (concat tx [(cond-> {:file/path file
:file/content content}
new?
;; TODO: use file system timestamp?
new?
;; TODO: use file system timestamp?
(assoc :file/created-at (date-time-util/time-ms)))])
tx' (gp-util/remove-nils tx)
result (if skip-db-transact?
Expand Down
1 change: 1 addition & 0 deletions deps/graph-parser/src/logseq/graph_parser/block.cljs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
(and
(= typ "Search")
(not (contains? #{\# \* \/ \[} (first value)))
;; FIXME: use `gp-util/get-format` instead
(let [ext (some-> (gp-util/get-file-ext value) keyword)]
(when (and (not (string/starts-with? value "http:"))
(not (string/starts-with? value "https:"))
Expand Down
36 changes: 22 additions & 14 deletions deps/graph-parser/src/logseq/graph_parser/extract.cljc
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,19 @@
:default [lambdaisland.glogi :as log])))

(defn- get-page-name
[file ast page-name-order]
"Get page name with overridden order of
`title::` property
file name parsing
first block content
note: `page-name-order` is deprecated on Apr. 2021
uri-encoded? - since paths on mobile are uri-encoded, need to decode them first
filename-format - the format used to parse file name
"
[file ast uri-encoded? filename-format]
;; headline
(let [ast (map first ast)]
(let [ast (map first ast)
file (if uri-encoded? (js/decodeURI file) file)]
;; check backward compatibility?
(if (string/includes? file "pages/contents.")
"Contents"
(let [first-block (last (first (filter gp-block/heading-block? ast)))
Expand All @@ -32,15 +42,13 @@
(and first-block
(string? title)
title))
file-name (when-let [file-name (last (string/split file #"/"))]
(let [result (first (gp-util/split-last "." file-name))]
(if (gp-config/mldoc-support? (string/lower-case (gp-util/get-file-ext file)))
(js/decodeURIComponent (string/replace result "." "/"))
result)))]
file-name (when-let [result (gp-util/path->file-body file)]
(if (gp-config/mldoc-support? (gp-util/get-file-ext file))
(gp-util/title-parsing result filename-format)
result))]
(or property-name
(if (= page-name-order "heading")
(or first-block-name file-name)
(or file-name first-block-name)))))))
file-name
first-block-name)))))

(defn- extract-page-alias-and-tags
[page-m page page-name properties]
Expand Down Expand Up @@ -108,14 +116,14 @@

;; TODO: performance improvement
(defn- extract-pages-and-blocks
[format ast properties file content {:keys [date-formatter page-name-order db] :as options}]
"uri-encoded? - if is true, apply URL decode on the file path"
[format ast properties file content {:keys [date-formatter db uri-encoded? filename-format] :as options}]
(try
(let [page (get-page-name file ast page-name-order)
(let [page (get-page-name file ast uri-encoded? filename-format)
[page page-name _journal-day] (gp-block/convert-page-if-journal page date-formatter)
options' (-> options
(assoc :page-name page-name
:original-page-name page)
(dissoc :page-name-order))
:original-page-name page))
blocks (->> (gp-block/extract-blocks ast content false format options')
(gp-block/with-parent-and-left {:block/name page-name})
(vec))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@
(clj->js (merge {:stdio "inherit"} opts))))

(defn clone-docs-repo-if-not-exists
[dir]
[dir branch]
(when-not (.existsSync fs dir)
(sh ["git" "clone" "--depth" "1" "-b" "v0.6.7" "-c" "advice.detachedHead=false"
(sh ["git" "clone" "--depth" "1" "-b" branch "-c" "advice.detachedHead=false"
"https://github.com/logseq/docs" dir] {})))


;; Fns for common test assertions
;; ==============================
(defn- get-top-block-properties
(defn get-top-block-properties
[db]
(->> (d/q '[:find (pull ?b [*])
:where
Expand All @@ -39,7 +39,7 @@
(filter #(>= (val %) 5))
(into {})))

(defn- get-all-page-properties
(defn get-all-page-properties
[db]
(->> (d/q '[:find (pull ?b [*])
:where
Expand All @@ -51,7 +51,7 @@
(apply merge-with +)
(into {})))

(defn- get-block-format-counts
(defn get-block-format-counts
[db]
(->> (d/q '[:find (pull ?b [*]) :where [?b :block/format]] db)
(map first)
Expand Down Expand Up @@ -84,24 +84,23 @@
"Journal page count on disk equals count in db")

(is (= {"CANCELED" 2 "DONE" 6 "LATER" 4 "NOW" 5}
(->> (d/q '[:find (pull ?b [*]) :where [?b :block/marker] ]
(->> (d/q '[:find (pull ?b [*]) :where [?b :block/marker]]
db)
(map first)
(group-by :block/marker)
(map (fn [[k v]] [k (count v)]))
(into {})))
"Task marker counts")

(is (= {:markdown 3143 :org 460}
(is (= {:markdown 3143 :org 460} ;; 2 pages for namespaces are not parsed
(get-block-format-counts db))
"Block format counts")

(is (= {:title 98 :id 98
:updated-at 47 :created-at 47
:card-last-score 6 :card-repeats 6 :card-next-schedule 6
:card-last-interval 6 :card-ease-factor 6 :card-last-reviewed 6
:alias 6 :logseq.macro-arguments 94 :logseq.macro-name 94
:heading 64}
:alias 6 :logseq.macro-arguments 94 :logseq.macro-name 94 :heading 64}
(get-top-block-properties db))
"Counts for top block properties")

Expand Down Expand Up @@ -132,6 +131,7 @@
set))
"Has correct namespaces")))

;; TODO update me to the number of the latest version of doc when namespace is updated
(defn docs-graph-assertions
"These are common assertions that should pass in both graph-parser and main
logseq app. It is important to run these in both contexts to ensure that the
Expand Down
113 changes: 93 additions & 20 deletions deps/graph-parser/src/logseq/graph_parser/util.cljs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,17 @@
[clojure.string :as string]
[clojure.edn :as edn]))

(defn safe-url-decode
[string]
(if (string/includes? string "%")
(try (some-> string str (js/decodeURIComponent))
(catch :default _
string))
string))

(defn path-normalize
"Normalize file path (for reading paths from FS, not required by writting)"
"Normalize file path (for reading paths from FS, not required by writting)
Keep capitalization senstivity"
[s]
(.normalize s "NFC"))

Expand Down Expand Up @@ -73,14 +82,6 @@
(str "0" n)
(str n)))

(defn get-file-ext
"Copy of frontend.util/get-file-ext. Too basic to couple to main app"
[file]
(and
(string? file)
(string/includes? file ".")
(some-> (last (string/split file #"\.")) string/lower-case)))

(defn remove-boundary-slashes
[s]
(when (string? s)
Expand All @@ -102,17 +103,40 @@
(conj result (str prev "/" (first others)))))
result))))

(defn decode-namespace-underlines
"Decode namespace underlines to slashed;
If continuous underlines, only decode at start;
Having empty namespace is invalid."
[string]
(string/replace string "___" "/"))

(defn page-name-sanity
"Sanitize the page-name."
([page-name]
(page-name-sanity page-name false))
([page-name replace-slash?]
(let [page (some-> page-name
(remove-boundary-slashes)
(path-normalize))]
(if replace-slash?
(string/replace page #"/" "%2F")
page))))
"Sanitize the page-name. Unify different diacritics and other visual differences.
Two objectives:
1. To be the same as in the filesystem;
2. To be easier to search"
[page-name]
(some-> page-name
(remove-boundary-slashes)
(path-normalize)))

(defn make-valid-namespaces
"Remove those empty namespaces from title to make it a valid page name."
[title]
(->> (string/split title "/")
(remove empty?)
(string/join "/")))

(def url-encoded-pattern #"(?i)%[0-9a-f]{2}") ;; (?i) for case-insensitive mode

(defn- tri-lb-title-parsing
"Parsing file name under the new file name format
Avoid calling directly"
[file-name]
(some-> file-name
(decode-namespace-underlines)
(string/replace url-encoded-pattern safe-url-decode)
(make-valid-namespaces)))

(defn page-name-sanity-lc
"Sanitize the query string for a page name (mandate for :block/name)"
Expand Down Expand Up @@ -144,10 +168,38 @@
;; default
(keyword format)))

(defn path->file-name
;; Only for interal paths, as they are converted to POXIS already
;; https://github.com/logseq/logseq/blob/48b8e54e0fdd8fbd2c5d25b7f1912efef8814714/deps/graph-parser/src/logseq/graph_parser/extract.cljc#L32
;; Should be converted to POXIS first for external paths
[path]
(if (string/includes? path "/")
(last (split-last "/" path))
path))

(defn path->file-body
[path]
(when-let [file-name (path->file-name path)]
(if (string/includes? file-name ".")
(first (split-last "." file-name))
file-name)))

(defn path->file-ext
[path-or-file-name]
(last (split-last "." path-or-file-name)))

(defn get-format
[file]
(when file
(normalize-format (keyword (string/lower-case (last (string/split file #"\.")))))))
(normalize-format (keyword (string/lower-case (path->file-ext file))))))

(defn get-file-ext
"Copy of frontend.util/get-file-ext. Too basic to couple to main app"
[file]
(and
(string? file)
(string/includes? file ".")
(some-> (path->file-ext file) string/lower-case)))

(defn valid-edn-keyword?
"Determine if string is a valid edn keyword"
Expand All @@ -157,3 +209,24 @@
(edn/read-string (str "{" s " nil}"))))
(catch :default _
false)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Keep for backward compatibility ;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Rule of dir-ver 0
;; Source: https://github.com/logseq/logseq/blob/e7110eea6790eda5861fdedb6b02c2a78b504cd9/deps/graph-parser/src/logseq/graph_parser/extract.cljc#L35
(defn legacy-title-parsing
[file-name-body]
(js/decodeURIComponent (string/replace file-name-body "." "/")))

;; Register sanitization / parsing fns in:
;; logseq.graph-parser.util (parsing only)
;; frontend.util.fs (sanitization only)
;; frontend.handler.conversion (both)
(defn title-parsing
"Convert file name in the given file name format to page title"
[file-name-body filename-format]
(case filename-format
:triple-lowbar (tri-lb-title-parsing file-name-body)
(legacy-title-parsing file-name-body)))
5 changes: 3 additions & 2 deletions deps/graph-parser/test/logseq/graph_parser/cli_test.cljs
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
;; Integration test that test parsing a large graph like docs
(deftest ^:integration parse-graph
(let [graph-dir "test/docs"
_ (docs-graph-helper/clone-docs-repo-if-not-exists graph-dir)
{:keys [conn files asts]} (gp-cli/parse-graph graph-dir {:verbose false})]
;; TODO update docs filename rules to the latest version when the namespace PR is released
_ (docs-graph-helper/clone-docs-repo-if-not-exists graph-dir "v0.6.7")
{:keys [conn files asts]} (gp-cli/parse-graph graph-dir {:verbose false})] ;; legacy parsing

(docs-graph-helper/docs-graph-assertions @conn files)

Expand Down
2 changes: 1 addition & 1 deletion deps/graph-parser/test/logseq/graph_parser/mldoc_test.cljs
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ body"

(deftest ^:integration test->edn
(let [graph-dir "test/docs"
_ (docs-graph-helper/clone-docs-repo-if-not-exists graph-dir)
_ (docs-graph-helper/clone-docs-repo-if-not-exists graph-dir "v0.6.7")
files (gp-cli/build-graph-files graph-dir)
asts-by-file (->> files
(map (fn [{:file/keys [path content]}]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
[logseq.graph-parser.cli-test]
[logseq.graph-parser.util.page-ref-test]
[logseq.graph-parser.util-test]
[logseq.graph-parser.util.file-name-test]
[logseq.graph-parser-test]))

(defmethod cljs.test/report [:cljs.test/default :end-run-tests] [m]
Expand All @@ -26,4 +27,5 @@
'logseq.graph-parser.cli-test
'logseq.graph-parser.util.page-ref-test
'logseq.graph-parser-test
'logseq.graph-parser.util.file-name-test
'logseq.graph-parser.util-test))
Loading

0 comments on commit 0e4037a

Please sign in to comment.