Skip to content

Commit

Permalink
Merge branch 'master' into python-Levenshtein-name
Browse files Browse the repository at this point in the history
  • Loading branch information
orsinium authored Sep 28, 2023
2 parents 4ce495b + 31fe59c commit a408adc
Show file tree
Hide file tree
Showing 19 changed files with 151 additions and 72 deletions.
53 changes: 0 additions & 53 deletions .drone.star

This file was deleted.

77 changes: 77 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
name: main

on:
push:
branches:
- master
pull_request:
workflow_dispatch:

concurrency:
group: ${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.8"
- uses: arduino/setup-task@v1
with:
repo-token: ${{ github.token }}
- run: task lint

pytest-pure:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version:
- "3.8"
- "3.9"
- "3.10"
- "3.11"
# - "3.12.0-rc.1"
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- uses: arduino/setup-task@v1
with:
repo-token: ${{ github.token }}
- run: task pytest-pure

# pytest-external:
# runs-on: ubuntu-latest
# strategy:
# fail-fast: false
# matrix:
# python-version:
# - "3.8"
# - "3.9"
# - "3.10"
# - "3.11"
# # - "3.12.0-rc.1"
# steps:
# - uses: actions/checkout@v3
# - uses: actions/setup-python@v4
# with:
# python-version: ${{ matrix.python-version }}
# - uses: arduino/setup-task@v1
# with:
# repo-token: ${{ github.token }}
# - run: task pytest-external

markdownlint-cli:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: nosborn/[email protected]
with:
files: .
config_file: .markdownlint.yaml
dot: true
8 changes: 8 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# https://github.com/DavidAnson/markdownlint/blob/main/schema/.markdownlint.yaml
default: true # enable all by default
MD007: # unordered list indentation
indent: 2
MD013: false # do not validate line length
MD014: false # allow $ before command output
MD029: # ordered list prefix
style: "one"
12 changes: 6 additions & 6 deletions README.md
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -148,23 +148,23 @@ pip install -e ".[benchmark]"
All algorithms have 2 interfaces:

1. Class with algorithm-specific params for customizing.
2. Class instance with default params for quick and simple usage.
1. Class instance with default params for quick and simple usage.

All algorithms have some common methods:

1. `.distance(*sequences)` -- calculate distance between sequences.
2. `.similarity(*sequences)` -- calculate similarity for sequences.
3. `.maximum(*sequences)` -- maximum possible value for distance and similarity. For any sequence: `distance + similarity == maximum`.
4. `.normalized_distance(*sequences)` -- normalized distance between sequences. The return value is a float between 0 and 1, where 0 means equal, and 1 totally different.
5. `.normalized_similarity(*sequences)` -- normalized similarity for sequences. The return value is a float between 0 and 1, where 0 means totally different, and 1 equal.
1. `.similarity(*sequences)` -- calculate similarity for sequences.
1. `.maximum(*sequences)` -- maximum possible value for distance and similarity. For any sequence: `distance + similarity == maximum`.
1. `.normalized_distance(*sequences)` -- normalized distance between sequences. The return value is a float between 0 and 1, where 0 means equal, and 1 totally different.
1. `.normalized_similarity(*sequences)` -- normalized similarity for sequences. The return value is a float between 0 and 1, where 0 means totally different, and 1 equal.

Most common init arguments:

1. `qval` -- q-value for split sequences into q-grams. Possible values:
- 1 (default) -- compare sequences by chars.
- 2 or more -- transform sequences to q-grams.
- None -- split sequences by words.
2. `as_set` -- for token-based algorithms:
1. `as_set` -- for token-based algorithms:
- True -- `t` and `ttt` is equal.
- False (default) -- `t` and `ttt` is different.

Expand Down
40 changes: 34 additions & 6 deletions Taskfile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,18 @@ tasks:
cmds:
- "{{.LINT_ENV}}/bin/twine upload dist/textdistance-*"

flake8:run:
flake8:
deps:
- lint:install
cmds:
- "{{.LINT_ENV}}/bin/flake8 ."
mypy:run:
mypy:
deps:
- lint:install
cmds:
- "{{.LINT_ENV}}/bin/mypy"


pytest-pure:run:
pytest-pure:
deps:
- task: pip:install
vars:
Expand All @@ -69,7 +68,7 @@ tasks:
cmds:
- "{{.TEST_PURE_ENV}}/bin/pytest -m 'not external' {{.CLI_ARGS}}"

pytest-external:run:
pytest-external:
deps:
- task: pip:install
vars:
Expand All @@ -78,12 +77,18 @@ tasks:
cmds:
- "{{.TEST_EXT_ENV}}/bin/pytest {{.CLI_ARGS}}"

isort:run:
isort:
deps:
- lint:install
cmds:
- "{{.LINT_ENV}}/bin/isort ."

isort:check:
deps:
- lint:install
cmds:
- "{{.LINT_ENV}}/bin/isort --check ."

benchmark:
deps:
- task: pip:install
Expand All @@ -92,3 +97,26 @@ tasks:
EXTRA: benchmark
cmds:
- "{{.BENCHMARK_ENV}}/bin/python3 -m textdistance.benchmark"

# groups
format:
desc: "run all code formatters"
cmds:
- task: isort
lint:
desc: "run all linters"
cmds:
- task: flake8
# - task: mypy
- task: isort:check
test:
desc: "run all tests"
cmds:
- task: pytest-pure
- task: pytest-external
all:
desc: "run all code formatters, linters, and tests"
cmds:
- task: format
- task: lint
- task: test
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
description_file = README.md
license_file = LICENSE
license_files = LICENSE

[flake8]
max-line-length=120
Expand Down
1 change: 1 addition & 0 deletions tests/test_edit/test_damerau_levenshtein.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# project
import textdistance


ALG = textdistance.DamerauLevenshtein

COMMON = [
Expand Down
1 change: 1 addition & 0 deletions textdistance/algorithms/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import annotations

# built-in
from collections import Counter
from contextlib import suppress
Expand Down
2 changes: 2 additions & 0 deletions textdistance/algorithms/compression_based.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import annotations

# built-in
import codecs
import math
Expand All @@ -12,6 +13,7 @@


try:
# built-in
import lzma
except ImportError:
lzma = None # type: ignore[assignment]
Expand Down
4 changes: 3 additions & 1 deletion textdistance/algorithms/edit_based.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
from __future__ import annotations

# built-in
from collections import defaultdict
from itertools import zip_longest
from typing import Any, Sequence, TypeVar

# app
from .base import Base as _Base, BaseSimilarity as _BaseSimilarity
from .types import TestFunc, SimFunc
from .types import SimFunc, TestFunc


try:
# external
import numpy
except ImportError:
numpy = None # type: ignore[assignment]
Expand Down
7 changes: 4 additions & 3 deletions textdistance/algorithms/phonetic.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
from __future__ import annotations

# built-in
from collections import defaultdict
from itertools import groupby
from itertools import groupby, zip_longest
from typing import Any, Iterator, Sequence, TypeVar

# app
from .base import Base as _Base, BaseSimilarity as _BaseSimilarity


from itertools import zip_longest
from typing import Any, Iterator, Sequence, TypeVar
try:
# external
import numpy
except ImportError:
numpy = None # type: ignore[assignment]
Expand Down
4 changes: 4 additions & 0 deletions textdistance/algorithms/sequence_based.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import annotations

# built-in
from difflib import SequenceMatcher as _SequenceMatcher
from typing import Any
Expand All @@ -8,9 +9,12 @@
from .base import BaseSimilarity as _BaseSimilarity
from .types import TestFunc


try:
# external
import numpy
except ImportError:
# built-in
from array import array
numpy = None # type: ignore[assignment]

Expand Down
2 changes: 2 additions & 0 deletions textdistance/algorithms/simple.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import annotations

# built-in
from itertools import takewhile
from typing import Sequence
Expand All @@ -7,6 +8,7 @@
from .base import Base as _Base, BaseSimilarity as _BaseSimilarity
from .types import SimFunc


__all__ = [
'Prefix', 'Postfix', 'Length', 'Identity', 'Matrix',
'prefix', 'postfix', 'length', 'identity', 'matrix',
Expand Down
1 change: 1 addition & 0 deletions textdistance/algorithms/token_based.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from __future__ import annotations

# built-in
from functools import reduce
from itertools import islice, permutations, repeat
Expand Down
1 change: 1 addition & 0 deletions textdistance/algorithms/types.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@

# built-in
from typing import Callable, Optional, TypeVar


Expand Down
3 changes: 2 additions & 1 deletion textdistance/algorithms/vector_based.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@
"""
# built-in
from functools import reduce
from typing import Any

# app
from .base import Base as _Base, BaseSimilarity as _BaseSimilarity


from typing import Any
try:
# external
import numpy
except ImportError:
numpy = None # type: ignore[assignment]
Expand Down
3 changes: 2 additions & 1 deletion textdistance/benchmark.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
from __future__ import annotations

# built-in
import json
from collections import defaultdict
import math
from collections import defaultdict
from timeit import timeit
from typing import Iterable, Iterator, NamedTuple

Expand Down
Loading

0 comments on commit a408adc

Please sign in to comment.