Skip to content

Commit

Permalink
Merge pull request mercari#42 from shibuiwilliam/feature/samples
Browse files Browse the repository at this point in the history
update
  • Loading branch information
shibuiwilliam authored Apr 25, 2021
2 parents bcf712a + de277db commit 794a8e5
Show file tree
Hide file tree
Showing 60 changed files with 223 additions and 31 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,7 @@ Separating the condition depends on how the model input, feature and target vary

## Needs consideration
- How to separate condition.
- Balance between number of models and model management.
- Balance between number of models and model management.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/condition_based_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,7 @@

## Needs consideration
- 状況の分割方法。
- モデル数と運用負荷のバランス。
- モデル数と運用負荷のバランス。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/condition_based_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,7 @@ condition-based serving pattern은 상황에 따라 모델을 선택하는 구
## Needs consideration
- 상황을 구분하는 방법
- 모델 수와 모델 운영 사이의 균형


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/condition_based_pattern
5 changes: 4 additions & 1 deletion Operation-patterns/Model-in-image-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,7 @@ A difficulty of the pattern is that the image building latency tends to get long
- Takes longer to build image and deploy.

## Needs consideration
- Pipeline definition.
- Pipeline definition.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_in_image_pattern
4 changes: 4 additions & 0 deletions Operation-patterns/Model-in-image-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,7 @@

## Needs consideration
- モデル学習からサーバイメージ構築までのパイプライン定義方法。


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_in_image_pattern
4 changes: 4 additions & 0 deletions Operation-patterns/Model-in-image-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,7 @@

## Needs consideration
- 모델 학습부터 이미지 빌드까지의 파이프라인 정의


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_in_image_pattern
5 changes: 4 additions & 1 deletion Operation-patterns/Model-load-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,7 @@ A difficulty of the pattern is to manage dependent library for the models. You a
- A new requirement of matching supported library versions between images and models is applicable for this pattern.

## Needs consideration
- Dependent version management of server image and model file.
- Dependent version management of server image and model file.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_load_pattern
5 changes: 4 additions & 1 deletion Operation-patterns/Model-load-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,7 @@
- サーバイメージとモデルファイルのバージョニングが必要。

## Needs consideration
- サーバイメージとモデルファイルのバージョニング方法および依存関係の管理。
- サーバイメージとモデルファイルのバージョニング方法および依存関係の管理。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_load_pattern
4 changes: 4 additions & 0 deletions Operation-patterns/Model-load-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,7 @@

## Needs consideration
- 서버 이미지와 모델 파일의 버전 및 의존성 관리.


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter3_release_patterns/model_load_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,7 @@ For those cases, it is useful to add a mechanism in your code to change behaviou
- Increase in rules make complexity in operation.

## Needs consideration
- Which cases to control with what variable.
- Which cases to control with what variable.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/paramater_based_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,7 @@ MLモデルは学習によってパラメータが固定されるため、一般
- ルールベースによる制御が増えることによる複雑化。

## Needs consideration
- どのケースをどういうルールと変数で対応するか。
- どのケースをどういうルールと変数で対応するか。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/paramater_based_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@

## Needs consideration
- 어떠한 케이스를 어떤 변수로 제어 가능할지.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/paramater_based_pattern
5 changes: 4 additions & 1 deletion Operation-patterns/Prediction-log-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,7 @@ Log is useful in many aspects. For instance, if the prediction or user behaviour
## Needs consideration
- Log frequency and log level.
- Storing frequency and backup.
- Purpose of analysis.
- Purpose of analysis.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_log_pattern
5 changes: 4 additions & 1 deletion Operation-patterns/Prediction-log-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,7 @@ MLシステムを組み込んだサービスを改善するためには推論結
## Needs consideration
- ログの収集頻度やログレベル
- DWHへ格納する頻度や期間
- 分析の目的
- 分析の目的

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_log_pattern
3 changes: 3 additions & 0 deletions Operation-patterns/Prediction-log-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@
- 로그 수집 빈도와 로그 레벨
- 저장 빈도와 백업.
- 분석의 목적.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_log_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,7 @@ Configuration for monitoring and alert, with its operation, depends on service l
## Needs consideration
- Service level.
- Monitoring and alert configuration and operation based on the level.
- Playbook.
- Playbook.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_monitoring_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,7 @@ Prediction monitoring pattern
## Needs consideration
- 推論器のサービスレベル
- 監視・通報の頻度やレベル、通報先、体制
- 異常時の対応方法やマニュアル
- 異常時の対応方法やマニュアル

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_monitoring_pattern
4 changes: 4 additions & 0 deletions Operation-patterns/Prediction-monitoring-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,7 @@ prediction monitoring pattern에서는 예측을 주로 모니터링합니다.
- 모니터링 및 알람.
- 레벨에 따른 설정과 운영.
- 대응 방법.


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter5_operations/prediction_monitoring_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Loading-test-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,7 @@ There may be a possibility that the bottleneck exists in a load balancer, networ
## Needs consideration
- Bottleneck and work around.
- It is recommended to prepare a variety of dataset.
- If the bottleneck is in model prediction, you may need to redevelop the model.
- If the bottleneck is in model prediction, you may need to redevelop the model.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/load_test_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Loading-test-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@
## Needs consideration
- システムのボトルネックとなる箇所
- 多種多様な入力データを用意すること
- 推論モデル自体が遅い場合、モデルの再開発が必要。
- 推論モデル自体が遅い場合、モデルの再開発が必要。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/load_test_pattern
4 changes: 4 additions & 0 deletions QA-patterns/Loading-test-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,7 @@ Loading test pattern은 기존 웹 서비스나 온라인 시스템의 부하
- 병목 현상 및 해결 방법.
- 다양한 데이터셋 준비를 추천합니다.
- 병목 현상이 모델 예측에 있는 경우 모델을 다시 개발해야할 수 있습니다.


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/load_test_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Online-ab-test-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@ You will let the new model to be online with responding the prediction to the cl

## Needs consideration
- Load balancing policy.
- Evaluation and decision policy of go or no go for the new model.
- Evaluation and decision policy of go or no go for the new model.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/online_ab_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Online-ab-test-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,7 @@

## Needs consideration
- 新旧モデルへのアクセス量の割り振り方法および量
- 収集するログおよび新旧モデルの評価方法、試験期間、停止・継続の判断基準
- 収集するログおよび新旧モデルの評価方法、試験期間、停止・継続の判断基準

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/online_ab_pattern
3 changes: 3 additions & 0 deletions QA-patterns/Online-ab-test-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,6 @@ Online AB test pattern은 새로운 모델을 프러덕션경시스템에 연결
## Needs consideration
- 부하 분산 정책.
- 새 모델을 사용할지 말지에 대한 평가 방식 및 판단 기준이 필요합니다.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/online_ab_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Shadow-ab-test-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,7 @@ In contrast to the online AB test pattern, the shadow AB test pattern will allow
- Additional cost.

## Needs consideration
- Evaluation and decision policy of go or no go for the new model.
- Evaluation and decision policy of go or no go for the new model.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/shadow_ab_pattern
5 changes: 4 additions & 1 deletion QA-patterns/Shadow-ab-test-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,7 @@
- 新推論サーバのためのコストが発生。

## Needs consideration
- 収集するログおよび新旧モデルの評価方法、試験期間、停止・継続の判断基準
- 収集するログおよび新旧モデルの評価方法、試験期間、停止・継続の判断基準

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/shadow_ab_pattern
3 changes: 3 additions & 0 deletions QA-patterns/Shadow-ab-test-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,6 @@ Online AB test pattern과 달리, Shadow AB test pattern을 사용하면 적은

## Needs consideration
- 새 모델을 사용할지 말지에 대한 평가 방식 및 판단 기준이 필요합니다.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter6_operation_management/shadow_ab_pattern
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ All of the ML system patterns are designed to be deployed on a public cloud or a
Please refer below for reading:<br>
[GitHub Pages](https://mercari.github.io/ml-system-design-pattern/)

## Sample implementations
Some sample implementations are available below.
https://github.com/shibuiwilliam/ml-system-in-actions

## Patterns
### [Serving patterns](./Serving-patterns/README.md)
The serving patterns are a series of system designs for using machine learning models in production workflow.
Expand Down
4 changes: 4 additions & 0 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
読みやすいフォーマットはこちらをご参照ください。<br>
[GitHub Pages](https://mercari.github.io/ml-system-design-pattern/README_ja.html)

## 実装例
一部の実装例を以下で公開しました。
https://github.com/shibuiwilliam/ml-system-in-actions

## Patterns
### [Serving patterns](./Serving-patterns/README_ja.md)
本番サービスで推論サーバを稼働させ運用するパターン。
Expand Down
3 changes: 3 additions & 0 deletions README_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
아래 링크로 더 편하게 문서를 읽어보세요.
[GitHub Pages](https://mercari.github.io/ml-system-design-pattern/README_ko.html)

## Sample implementations
아래에서 일부 샘플 구현을 사용할 수 있습니다.
https://github.com/shibuiwilliam/ml-system-in-actions
## Patterns
### [Serving patterns](./Serving-patterns/README_ko.md)
서빙 패턴은 실제 운영 환경에서 머신러닝 모델을 이용할 수 있도록 만드는 일련의 시스템 디자인들입니다.
Expand Down
5 changes: 4 additions & 1 deletion Serving-patterns/Asynchronous-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,7 @@ In addition, both in case of `Diagram1` and `Diagram2`, you can make the predict
- Needs consideration for prediction error:
- If you need to retry, consider triggering retry in the prediction server or return to queue.
- If the error is caused by data or programmatical issue, there may be a chance that the request keeps retrying until you manually disposes the request.
- Since the pattern does not support ordered predition, you have to consider the workflow if you need concrete order for input or event in the usecase.
- Since the pattern does not support ordered predition, you have to consider the workflow if you need concrete order for input or event in the usecase.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/asynchronous_pattern
5 changes: 4 additions & 1 deletion Serving-patterns/Asynchronous-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,7 @@
- 推論エラー時の対応方法を検討する必要がある。
- リトライする場合、推論サーバ内でリトライするか、キュー/キャッシュに戻すか。
- データやプログラムの誤りで推論エラーになる場合、そのリクエストを停止または破棄しない限り、リトライ↔エラーが続くことがある。
- 厳密な順番は保証されないため、入力やイベントに対する推論順が重要なワークフローの場合は検討が必要。
- 厳密な順番は保証されないため、入力やイベントに対する推論順が重要なワークフローの場合は検討が必要。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/asynchronous_pattern
4 changes: 4 additions & 0 deletions Serving-patterns/Asynchronous-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ Asynchronous pattern은 클라이언트와 예측 서버 사이에 대기열이
- 만약 재시도해야 하면, 예측 서버에서 재시도하거나 또는 큐로 돌아갑니다.
- 만약 오류가 데이터 또는 프로그래밍 이슈로 발생했다면, 수동으로 요청을 처리할 때까지 요청이 계속 재시도될 가능성이 있습니다.
- 이 패턴은 순서가 있는 예측을 지원하지 않기 때문에, 사용 사례에서 입력 또는 이벤트에 대한 구체적인 워크플로우를 고려해야 합니다.


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/asynchronous_pattern
5 changes: 4 additions & 1 deletion Serving-patterns/Batch-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,7 @@ If you don't have to run the predictions real-time, you may choose the batch pat
- Retry all: rerun the whole prediction of a batch. Used when a prediction or data has dependency on others.
- Partial retry: rerun on the failed dataset. Used when there is no dependency.
- No retry: run prediction on the failed dataset on the next batch. Used when there is no strict time limitation.
- If the batch timeframe is long, like once per month or once per year, it is suggested to monitor, or run temporarily, of the batch execution, for the model or system may be outdated.
- If the batch timeframe is long, like once per month or once per year, it is suggested to monitor, or run temporarily, of the batch execution, for the model or system may be outdated.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/batch_pattern
5 changes: 4 additions & 1 deletion Serving-patterns/Batch-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@
- 全件リトライ:失敗した場合は再度推論バッチ・ジョブを実行し、全件を推論する。
- 一部リトライ:失敗したデータのみ再度推論する。
- 放置:失敗してもリトライを起動せず、次のバッチ・ジョブで推論する。
- 毎月や毎年等、推論バッチ・ジョブを起動する時間が大きく離れる場合、機械学習モデルの有効性(Out-of-date)やサーバ自体の稼働可否をモニタリングする必要がある。
- 毎月や毎年等、推論バッチ・ジョブを起動する時間が大きく離れる場合、機械学習モデルの有効性(Out-of-date)やサーバ自体の稼働可否をモニタリングする必要がある。

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/batch_pattern
5 changes: 4 additions & 1 deletion Serving-patterns/Batch-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,7 @@
- 모두 재시도: 배치의 전체 예측을 다시 실행합니다. 예측이나 데이터가 다른 것에 의존할 때 사용됩니다.
- 부분 재시도: 실패한 데이터세트에서 다시 실행합니다. 의존성이 없을 때 사용됩니다.
- 재시도 없음: 다음 배치에서 실패한 데이터 세트에 대해 예측을 실행합니다. 엄격한 시간 제한이 없을 때 사용됩니다.
- 만약 배치 기간이 매월 1회 또는 1년에 1번 같이 긴 경우, 모델 또는 시스템이 오래되었을 수 있으므로 배치 실행을 모니터링하거나 임시로 실행하는 것이 좋습니다.
- 만약 배치 기간이 매월 1회 또는 1년에 1번 같이 긴 경우, 모델 또는 시스템이 오래되었을 수 있으므로 배치 실행을 모니터링하거나 임시로 실행하는 것이 좋습니다.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/batch_pattern
4 changes: 4 additions & 0 deletions Serving-patterns/Data-cache-pattern/design_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@ There are two architectures for the pattern.
- Tradeoff between speed, cost and volume.
- Cache clear policy.
- Data cache policy.


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/data_cache_pattern
4 changes: 4 additions & 0 deletions Serving-patterns/Data-cache-pattern/design_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@
- メリットとキャッシュのコスト、容量とのトレードオフ。
- キャッシュクリアのタイミング。
- データをキャッシュするパターンの選定。


## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/data_cache_pattern
5 changes: 4 additions & 1 deletion Serving-patterns/Data-cache-pattern/design_ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,7 @@ Data cache pattern은 입력 데이터를 캐시합니다. 입력 데이터가
- 입력 데이터는 키로 식별할 수 있어야 합니다.
- 속도, 비용, 볼륨 간의 균형을 고려해야 합니다.
- 캐시 삭제 정책이 필요합니다.
- 데이터 캐시 정책이 필요합니다.
- 데이터 캐시 정책이 필요합니다.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/data_cache_pattern
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,7 @@ You may place a proxy in between client and prediction services. You may expect
## Needs consideration
- Synchronous or asynchronous.
- How to manage slow model for synchronous: timeout or wait.
- How to manage time lag for asynchronous.
- How to manage time lag for asynchronous.

## Sample
https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/horizontal_microservice_pattern
Loading

0 comments on commit 794a8e5

Please sign in to comment.