This repository contains tools built with tree-sitter that let you:
- Inspect the concrete syntax tree of a source file
- Use pre-written tree-sitter query files to locate important symbols in source code
- Optionally format output in JSON to use the results in your own applications
Contributions welcome. These queries are used by Codeium Search to index your code locally for semantic search! Adding queries for your language here will enable Codeium Search to work better on your own code!
In particular, this repo provides a binary prepackaged with:
- A recent version of the tree-sitter library
- A large number of tree-sitter grammars
- An implementation of many common query predicates
$ ./download_parse.sh
$ ./parse -file examples/example.js -named_only
program [0, 0] - [4, 0] "// Adds two numbers.\n…"
comment [0, 0] - [0, 20] "// Adds two numbers."
function_declaration [1, 0] - [3, 1] "function add(a, b) {\n…"
name: identifier [1, 9] - [1, 12] "add"
parameters: formal_parameters [1, 12] - [1, 18] "(a, b)"
identifier [1, 13] - [1, 14] "a"
identifier [1, 16] - [1, 17] "b"
body: statement_block [1, 19] - [3, 1] "{\n…"
return_statement [2, 4] - [2, 17] "return a + b;"
binary_expression [2, 11] - [2, 16] "a + b"
left: identifier [2, 11] - [2, 12] "a"
right: identifier [2, 15] - [2, 16] "b"
$ ./parse -file examples/example.js -use_tags_query -json | jq ".captures.doc[0].text"
"// Adds two numbers."
Queries try to follow the conventions established by tree-sitter.
Most captures also include documentation as @doc
. @definition.function
and @definition.method
also capture @codeium.parameters
.
Python | TypeScript | JavaScript | Go | |
---|---|---|---|---|
@definition.class |
✅ | ✅ | ✅ | ✅ |
@definition.function |
✅ | ✅1 | ✅ | ✅ |
@definition.method |
✅2 | ✅1 | ✅ | ✅ |
@definition.interface |
N/A | ✅ | N/A | ✅ |
@definition.namespace |
N/A | ✅ | N/A | N/A |
@definition.module |
N/A | ✅ | N/A | N/A |
@definition.type |
N/A | ✅ | N/A | ✅ |
@definition.constant |
❌ | ❌ | ❌ | ❌ |
@definition.enum |
❌ | ❌ | ❌ | ❌ |
@reference.call |
✅ | ✅ | ✅ | ✅ |
@reference.class |
✅3 | ✅ | ✅ | ✅ |
Want to write a query for a new language? tags.scm
and other queries in each language's tree-sitter repository, like tree-sitter-javascript, are a good place to start.
$ ./parse -supported_predicates
#eq?/#not-eq?
(#eq? <@capture|"literal"> <@capture|"literal">)
Checks if two values are equal.
#has-parent?/#not-has-parent?
(#has-parent? @capture node_type...)
Checks if @capture has a parent node of any of the given types.
#has-type?/#not-has-type?
(#has-type? @capture node_type...)
Checks if @capture has a node of any of the given types.
#match?/#not-match?
(#match? @capture "regex")
Checks if the text for @capture matches the given regular expression.
#select-adjacent!
(#select-adjacent! @capture @anchor)
Selects @capture nodes contiguous with @anchor (all starting and ending on
adjacent lines).
#strip!
(#strip! @capture "regex")
Removes all matching text from all @capture nodes.
Need a predicate which hasn't been implemented? File an issue! We try to use predicates from nvim-treesitter.
$ ./parse -supported_languages
c
cpp
csharp
css
dart
go
hcl
html
java
javascript
json
kotlin
latex
markdown
php
protobuf
python
ruby
rust
shell
svelte
toml
tsx
typescript
vue
yaml
Looking for support for another language? File an issue with a link to the repo that contains the grammar.
Pull requests are welcome. For non-issue discussions about codeium-parse
, join
our Discord.
- You can create new source files with patterns you want to target in
test_files/
. - Look at the syntax tree using
./parse -file test_files/<your file>
to get a sense of how to capture the pattern. - Learn the query syntax from tree-sitter documentation.
- Run
./goldens.sh
to see what your query captures.