feat(ctranslate): initial infrastructure support (bentoml#694)

* perf: compact and improve speed and agility Signed-off-by: Aaron <[email protected]> * --wip-- Signed-off-by: Aaron <[email protected]> * chore: cleanup infrastructure Signed-off-by: Aaron <[email protected]> * chore: update styles notes and autogen mypy configuration Signed-off-by: Aaron <[email protected]> --------- Signed-off-by: Aaron <[email protected]>
Hipoooop · Nov 19, 2023 · 206521e · 206521e
1 parent 93ffb29
commit 206521e
Show file tree

Hide file tree

Showing 38 changed files with 506 additions and 641 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -12,6 +12,7 @@ openllm-python/CHANGELOG.md linguist-generated=true
 
 # Others
 Formula/openllm.rb linguist-generated=true
+mypy.ini linguist-generated=true
 
 * text=auto eol=lf
 # Needed for setuptools-scm-git-archive

diff --git a/ruff.toml → .ruff.toml b/ruff.toml → .ruff.toml
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -205,6 +205,12 @@ See this [docs](/.github/INFRA.md) for more information on OpenLLM's CI/CD workf
 ## Typing
 For all internal functions, it is recommended to provide type hint. For all public function definitions, it is recommended to create a stubs file `.pyi` to separate supported external API to increase code visibility. See [openllm-client's `__init__.pyi`](/openllm-client/src/openllm_client/__init__.pyi) for example.
 
+If an internal helpers or any functions, utilities that is prefixed with `_`, then it is recommended to provide inline annotations. See [STYLE.md](./STYLE.md) to learn more about style and typing philosophy.
+
+If you want to update any mypy configuration, please update the [`./tools/update-mypy.py`](./tools/update-mypy.py)
+
+If you need to update pyright configuration, please update the [`pyrightconfig.json`](./pyrightconfig.json)
+
 ## Install from git archive install
 
 ```bash

diff --git a/README.md b/README.md
@@ -503,14 +503,6 @@ openllm start tiiuae/falcon-7b --backend pt
 
 ### Quickstart
 
-
-
-> **Note:** FlanT5 requires to install with:
-> ```bash
-> pip install "openllm[flan-t5]"
-> ```
-
-
 Run the following command to quickly spin up a FlanT5 server:
 
 ```bash
@@ -869,14 +861,6 @@ TRUST_REMOTE_CODE=True openllm start mosaicml/mpt-7b --backend pt
 
 ### Quickstart
 
-
-
-> **Note:** OPT requires to install with:
-> ```bash
-> pip install "openllm[opt]"
-> ```
-
-
 Run the following command to quickly spin up a OPT server:
 
 ```bash

diff --git a/STYLE.md b/STYLE.md
@@ -1,4 +1,4 @@
-## the coding style
+## the coding style.
 
 This documentation serves as a brief discussion of the coding style used for
 OpenLLM. As you have noticed, it is different from the conventional
@@ -48,14 +48,16 @@ rather the brevity of expression. (it enables
 [expository programming](http://archive.vector.org.uk/art10000980), combining
 with prototyping new ideas and logics within models implementation)
 
-## some guidelines
+## some guidelines.
 
 Though I have stopped using deterministic formatter and linter, I do understand
 that people have preferences for using these tools, and it plays nicely with IDE
 and editors. As such, I included a [`pyproject.toml`](./pyproject.toml) file
 that specifies some configuration for the tools that makes it compiliant with
-the repository's style. In short, some of the tools include `ruff`, `yapf`, and
-`interrogate`. Since we manage everything via `hatch`, refer back to the
+the repository's style. In short, I'm using `ruff` for both linting and formatting,
+`mypy` for type checking, and provide a `pyright` compatible configuration for those
+who wishes to use VSCode or `pyright` LSP.
+Since we manage everything via `hatch`, refer back to the
 [DEVELOPMENT.md](./DEVELOPMENT.md) for more information on this.
 
 Overtime, Python has incorporated a lot of features that supports this style of
@@ -68,7 +70,7 @@ somewhat, type-safe. Since there is no real type-safety when working with
 Python, typing should be a best-effort to make sure we don't introduce too many
 bugs.
 
-### naming
+### naming.
 
 - follow Python standard for this, I don't have too much opinion on this. Just
   make sure that it is descriptive, and the abbreviation describes the intent of
@@ -84,7 +86,7 @@ bugs.
 
 _If you have any suggestions, feel free to give it on our discord server!_
 
-### layout
+### layout.
 
 - Preferably not a lot of whitespaces, but rather flowing. If you can fit
   everything for `if`, `def` or a `return` within one line, then there's no need
@@ -108,17 +110,18 @@ _If you have any suggestions, feel free to give it on our discord server!_
 
 - With regards to writing operator, try to follow the domain-specific notation.
   I.e: when writing pathlib, just don't add space since that is not how you
-  write a path in the terminal. `yapf` will try to accommodate some of this
+  write a path in the terminal. `ruff format` will try to accommodate some of this
   changes.
 
 - Avoid trailing whitespace
 
 - use array, pytorch or numpy-based indexing where possible.
 
 - If you need to export anything, put it in `__all__` or do lazy export for
-  type-safe checker.
+  type-safe checker. See [OpenLLM's `__init__.py`](./openllm-python/src/openllm/__init__.py)
+  for example on how to lazily export a module.
 
-### misc
+### misc.
 
 - import alias should be concise and descriptive. A convention is to always
   `import typing as t`.
@@ -129,20 +132,64 @@ _If you have any suggestions, feel free to give it on our discord server!_
   MDX and will be hosted on the GitHub Pages, so stay tuned!
 - If anything that is not used for runtime, just put it under `t.TYPE_CHECKING`
 
-### note on codegen
+### note on codegen.
 
 - We also do some codegen for some of the assignment functions. These logics are
   largely based on the work of [attrs](https://github.com/python-attrs/attrs) to
   ensure fast and isolated codegen in Python. If you need codegen but don't know
   how it works, feel free to mention @aarnphm on discord!
 
+### types.
+
+I do believe in static type checking, and often times all of the code in OpenLLM are safely-types.
+Types play nicely with static analysis tools, and it is a great way to catch bugs for applications
+downstream. In Python, there are two ways for doing static type:
+
+1. Stubs files (recommended)
+
+If you have seen files that ends with `.pyi`, those are stubs files. Stubs files are great format
+for specifying types for external API, and it is a great way to separate the implementation from
+the API. For example, if you want to specify the type for `openllm_client.Client`, you can create
+a stubs file `openllm_client/__init__.pyi` and specify the type there.
+
+A few examples include [`openllm.LLM` types definition](./openllm-python/src/openllm/_llm.pyi) versus
+the [actual implementation](./openllm-python/src/openllm/_llm.py).
+
+> Therefore, if you touch any public API, make sure to also update and add/update the stubs files correctly.
+
+2. Inline annotations (encourage, not required)
+
+Inline annotations are great for specifying types for internal functions. For example:
+```python
+def _resolve_internal_converter(llm: LLM, type_: str) -> Converter: ...
+```
+
+This is not always required. If the internal functions are expressive enough, as well
+as the variable names are descriptive to ensure there is not type abrasion, then it is not
+required to specify the types. For example:
+```python
+import torch, torch.nn.functional as F
+rms_norm = lambda tensor: torch.sqrt(F.mean(torch.square(tensor)))
+```
+As you can see, the function calculate the RMSNorm of a given torch tensor.
+
+#### note on `TYPE_CHECKING` block.
+
+As you can see, we also incorporate `TYPE_CHECKING` argument into various places.
+This will provides some nice in line type checking when development. Usually, I think
+it is nice to have, but once the files get more and more complex, it is better to just
+provide a stubs file for it.
+
 ## FAQ
 
 ### Why not use `black`?
 
 `black` is used on our other projects, but I rather find `black` to be very
 verbose and overtime it is annoying to work with too much whitespaces.
 
+Personally, I think four spaces is a mistake, as in some cases it is harder to read
+with four spaces code versus 2 spaces code.
+
 ### Why not PEP8?
 
 PEP8 is great if you are writing library such as this, but I'm going to do a lot
@@ -152,7 +199,7 @@ probably not fit here, and want to explore more expressive style.
 ### Editor is complaining about the style, what should I do?
 
 Kindly ask you to disable linting for this project 🤗. I will try my best to
-accomodate with ruff and yapf, but I don't want to spend too much time on this.
+accomodate for ruff and yapf, but I don't want to spend too much time on this.
 It is pretty stragithforward to disable it in your editor, with google.
 
 ### Style might put off new contributors?

diff --git a/all.sh b/all.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+
+printf "Running mirror.sh\n"
+bash ./tools/mirror.sh
+printf "Running update-mypy.py\n"
+python ./tools/update-mypy.py
+printf "Running update-config-stubs.py\n"
+python ./tools/dependencies.py
+printf "Running dependencies.py\n"
+python ./tools/update-config-stubs.py
diff --git a/hatch.toml b/hatch.toml
@@ -1,66 +1,53 @@
 [envs.default]
 dependencies = [
-  "openllm-core @ {root:uri}/openllm-core",
-  "openllm-client @ {root:uri}/openllm-client",
-  "openllm[opt,chatglm,fine-tune] @ {root:uri}/openllm-python",
-  # NOTE: To run all hooks
-  "pre-commit",
-  # NOTE: towncrier for changelog
-  "towncrier",
-  # NOTE: Using under ./tools/update-optional-dependencies.py
-  "tomlkit",
-  # NOTE: For fancy PyPI readme
-  "hatch-fancy-pypi-readme",
-  # NOTE: For working with shell pipe
-  "plumbum",
-  # The below sync with mypyc deps and pre-commit mypy
-  "types-psutil",
-  "types-tabulate",
-  "types-PyYAML",
-  "types-protobuf",
+    "openllm-core @ {root:uri}/openllm-core",
+    "openllm-client @ {root:uri}/openllm-client",
+    "openllm[chatglm,fine-tune] @ {root:uri}/openllm-python",
+    # NOTE: To run all hooks
+    "pre-commit",
+    # NOTE: towncrier for changelog
+    "towncrier",
+    # NOTE: Using under ./tools/update-optional-dependencies.py
+    "tomlkit",
+    # NOTE: For fancy PyPI readme
+    "hatch-fancy-pypi-readme",
+    # NOTE: For working with shell pipe
+    "plumbum",
+    # The below sync with mypyc deps and pre-commit mypy
+    "types-psutil",
+    "types-tabulate",
+    "types-PyYAML",
+    "types-protobuf",
 ]
 [envs.default.scripts]
 changelog = "towncrier build --version main --draft"
-check-stubs = ["./tools/update-config-stubs.py"]
 inplace-changelog = "towncrier build --version main --keep"
-quality = [
-  "./tools/dependencies.py",
-  "- ./tools/update-brew-tap.py",
-  "check-stubs",
-  "bash ./tools/mirror.sh",
-  "- pre-commit run --all-files",
-  "- pnpm format",
-]
 setup = [
-  "pre-commit install",
-  "- ln -s .python-version-default .python-version",
-  "curl -fsSL https://raw.githubusercontent.com/clj-kondo/clj-kondo/master/script/install-clj-kondo | bash -",
-]
-tool = ["quality", "bash ./clean.sh", "bash ./compile.sh {args}"]
-typing = [
-  "- pre-commit run mypy {args:-a}",
-  "- pre-commit run pyright {args:-a}",
+    "pre-commit install",
+    "- ln -s .python-version-default .python-version",
 ]
+quality = ["bash ./all.sh", "- pre-commit run --all-files", "- pnpm format"]
+tool = ["quality", "bash ./clean.sh", 'python ./cz.py']
 [envs.tests]
 dependencies = [
-  "openllm-core @ {root:uri}/openllm-core",
-  "openllm-client @ {root:uri}/openllm-client",
-  "openllm[opt,chatglm,fine-tune] @ {root:uri}/openllm-python",
-  # NOTE: interact with docker for container tests.
-  "docker",
-  # NOTE: Tests strategies with Hypothesis and pytest, and snapshot testing with syrupy
-  "coverage[toml]>=6.5",
-  "filelock>=3.7.1",
-  "pytest",
-  "pytest-cov",
-  "pytest-mock",
-  "pytest-randomly",
-  "pytest-rerunfailures",
-  "pytest-asyncio>=0.21.0",
-  "pytest-xdist[psutil]",
-  "trustme",
-  "hypothesis",
-  "syrupy",
+    "openllm-core @ {root:uri}/openllm-core",
+    "openllm-client @ {root:uri}/openllm-client",
+    "openllm[chatglm,fine-tune] @ {root:uri}/openllm-python",
+    # NOTE: interact with docker for container tests.
+    "docker",
+    # NOTE: Tests strategies with Hypothesis and pytest, and snapshot testing with syrupy
+    "coverage[toml]>=6.5",
+    "filelock>=3.7.1",
+    "pytest",
+    "pytest-cov",
+    "pytest-mock",
+    "pytest-randomly",
+    "pytest-rerunfailures",
+    "pytest-asyncio>=0.21.0",
+    "pytest-xdist[psutil]",
+    "trustme",
+    "hypothesis",
+    "syrupy",
 ]
 skip-install = false
 template = "tests"
@@ -91,10 +78,10 @@ clojure = ["bash external/clojure/run-clojure-ui.sh"]
 detached = true
 [envs.ci.scripts]
 client-stubs = "bash openllm-client/generate-grpc-stubs"
-compile = "bash ./compile.sh {args}"
+compile = "bash ./tools/compile.sh {args}"
 recompile = ["bash ./clean.sh", "compile"]
 edi = "bash local.sh"
 lock = [
-  "bash tools/lock-actions.sh",
-  "pushd external/clojure && pnpm i --frozen-lockfile",
+    "bash tools/lock-actions.sh",
+    "pushd external/clojure && pnpm i --frozen-lockfile",
 ]
diff --git a/mypy.ini b/mypy.ini
diff --git a/openllm-client/src/openllm_client/_utils.py b/openllm-client/src/openllm_client/_utils.py
@@ -2,7 +2,8 @@
 
 
 def __dir__():
-  return dir(openllm_core.utils)
+  coreutils = set(dir(openllm_core.utils)) | set([it for it in openllm_core.utils._extras if not it.startswith('_')])
+  return sorted(list(coreutils))
 
 
 def __getattr__(name):

diff --git a/openllm-client/src/openllm_client/_utils.pyi b/openllm-client/src/openllm_client/_utils.pyi
@@ -19,6 +19,7 @@ from openllm_core.utils import (
   generate_hash_from_file as generate_hash_from_file,
   get_debug_mode as get_debug_mode,
   get_quiet_mode as get_quiet_mode,
+  getenv as getenv,
   in_notebook as in_notebook,
   lenient_issubclass as lenient_issubclass,
   reserve_free_port as reserve_free_port,
@@ -40,7 +41,6 @@ from openllm_core.utils.import_utils import (
   is_jupyter_available as is_jupyter_available,
   is_jupytext_available as is_jupytext_available,
   is_notebook_available as is_notebook_available,
-  is_optimum_supports_gptq as is_optimum_supports_gptq,
   is_peft_available as is_peft_available,
   is_torch_available as is_torch_available,
   is_transformers_available as is_transformers_available,

diff --git a/openllm-core/src/openllm_core/_typing_compat.py b/openllm-core/src/openllm_core/_typing_compat.py
@@ -30,10 +30,10 @@ def get_literal_args(typ: t.Any) -> tuple[str, ...]:
 TupleAny = t.Tuple[t.Any, ...]
 At = t.TypeVar('At', bound=attr.AttrsInstance)
 
-LiteralDtype = t.Literal['float16', 'float32', 'bfloat16']
+LiteralDtype = t.Literal['float16', 'float32', 'bfloat16', 'int8', 'int16']
 LiteralSerialisation = t.Literal['safetensors', 'legacy']
 LiteralQuantise = t.Literal['int8', 'int4', 'gptq', 'awq', 'squeezellm']
-LiteralBackend = t.Literal['pt', 'vllm', 'ggml', 'mlc']
+LiteralBackend = t.Literal['pt', 'vllm', 'ctranslate', 'ggml', 'mlc']
 AdapterType = t.Literal[
   'lora', 'adalora', 'adaption_prompt', 'prefix_tuning', 'p_tuning', 'prompt_tuning', 'ia3', 'loha', 'lokr'
 ]

diff --git a/openllm-core/src/openllm_core/config/configuration_baichuan.py b/openllm-core/src/openllm_core/config/configuration_baichuan.py
@@ -24,7 +24,7 @@ class BaichuanConfig(openllm_core.LLMConfig):
     'trust_remote_code': True,
     'timeout': 3600000,
     'url': 'https://github.com/baichuan-inc/Baichuan-7B',
-    'requirements': ['cpm-kernels', 'sentencepiece'],
+    'requirements': ['cpm-kernels'],
     'architecture': 'BaiChuanForCausalLM',
     # NOTE: See the following
     # https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/blob/19ef51ba5bad8935b03acd20ff04a269210983bc/modeling_baichuan.py#L555

diff --git a/openllm-core/src/openllm_core/config/configuration_chatglm.py b/openllm-core/src/openllm_core/config/configuration_chatglm.py
@@ -30,7 +30,7 @@ class ChatGLMConfig(openllm_core.LLMConfig):
     'trust_remote_code': True,
     'timeout': 3600000,
     'url': 'https://github.com/THUDM/ChatGLM-6B',
-    'requirements': ['cpm-kernels', 'sentencepiece'],
+    'requirements': ['cpm-kernels'],
     'architecture': 'ChatGLMModel',
     'default_id': 'thudm/chatglm-6b',
     'model_ids': [