Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment on Yarn #2

Open
jentle opened this issue Aug 28, 2020 · 1 comment
Open

Deployment on Yarn #2

jentle opened this issue Aug 28, 2020 · 1 comment
Labels
design Design ideas & wanted features, etc. good first issue Good for newcomers help wanted Extra attention is needed

Comments

@jentle
Copy link

jentle commented Aug 28, 2020

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
目前, kun 只可以的部署和调度执行只能限制在单台机器上(或者是单个容器中), 并且任务的调度执行并没有相对应的资源管理和限制。 很难适应大规模任务调度的场景, 也不利于扩展。

可以利用成熟的资源调度和管理框架, 根据需求实现分布式的任务分发和执行,比如 yarn, kubernetes,mesos。 yarn 是hadoop 平台上比较成功的资源调度器, 我们可以通过实现yarn app 的方式,来向yarn 请求资源并且执行任务。以下提出一种基于yarn 的任务调度方案。

At present, the deployment and scheduling execution of kun can only be restricted to a single machine (or in a single container), and there is no corresponding resource management and restriction for task scheduling and execution. It is difficult to adapt to large-scale task scheduling scenarios, and it is not conducive to expansion.

A mature resource scheduling and management framework can be used to implement task distribution and execution according to requirements, such as yarn, kubernetes, and mesos. Yarn is a relatively successful resource scheduler on the Hadoop platform. We can request resources from Yarn and perform tasks by implementing yarn app. The following proposes a task scheduling scheme based on yarn.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Untitled Diagram (15)

一种简单的实现方式是对 LocalExecutor 的拓展实现。 当前 LocalExecutor 的实现是通过 多线程去启动本地机器的java 进程来完成特定的任务。 如果通过 Yarn 来实现的话,LocalExecutor的工作流可以简化为

  1. 启动 Yarnclient, 建立与 ResourceManager的连接, 以下简称RM
  2. 启动ApplicaitonMaster, 以下简称AM
  3. 任务执行: 通过AM 申请 容器, 分发jar 文件,在容器中启动 任务的java进程
  4. 通过AM 回调或者轮询,获得容器执行 结果,返回任务执行状态
  5. 任务终止:通过AM 来终止 容器,回收资源

A simple solution to achieve this is to modify the implementation of LocalExecutor. The current implementation of LocalExecutor is to start the java process of the local machine through thread pool to complete specific tasks. If implemented by Yarn, the workflow of LocalExecutor can be simplified to

  1. Start Yarnclient and establish a connection with ResourceManager, hereinafter referred to as RM
  2. Start ApplicaitonMaster, hereinafter referred to as AM
  3. Task execution: apply for the container through AM, distribute the jar file, and start the java process of the task in the container
  4. Obtain the container execution result through AM callback or polling, and return the task execution status
  5. Task termination: use AM to terminate the container and reclaim resources

优点:

  • 工程简单,将原来 localexecutor的本地线程池管理替换为 通过yarn的资源池管理
  • 资源利用效率高,资源约束可以精确到每个任务进程,用时申请,不用时释放

缺点:

  • 在集群紧张的情况下,资源并不保证分配。
  • yarn容器的启动,jar文件的分发 都需要一定时间,整个任务的执行要比本地线程慢不少
  • 任务的执行强绑定 yarn, 任务的debug都需要依赖 yarn的相关组件来查看。

Pros:

  • The effort of engineering is simple, only replace the local thread pool management of the original localexecutor with the resource pool management through Yarn
  • High resource utilization efficiency, resource constraints can be accurate to each task process, apply when used, and release when not in use

Cons:

  • In the case of a tight usage of cluster, resources are not guaranteed to be allocated.
  • It takes a certain amount of time to start the yarn container and distribute the jar file, and the execution of the entire task is much slower than the local thread
  • The execution of the task is strongly bound to yarn, and the debug of the task needs to be checked by the relevant components of yarn.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Untitled Diagram-Page-1

另外一种较为复杂的实现, 则是将整个executor进行分布式部署, 更符合传统的 master/slave形式的方案,整个复杂度提升不少。

该方案中, scheduler 承担着 executor 分发的任务, executor依然通过本地方式完成任务。

  1. 启动 yarnclient, 建立与 ResourceManager的连接, 以下简称RM
  2. 启动ApplicaitonMaster, 以下简称AM
  3. 通过AM 申请固定大小的容器, 分发jar 文件,在容器中启动 executor的java进程
  4. executor 和 scheduler 通过 rpc 建立 heartbeat,汇报当前状态,任务负载, 准备响应 任务分发
  5. executor 获取到任务请求, 通过本地线程池启动任务执行并向scheduler 汇报任务状态


Another more complex solution is to deploy the entire executor in a distributed manner, which is more in line with the traditional master/slave scheme, and the overall complexity is greatly increased.

In this solution, the scheduler is responsible for the tasks distributed by the executor, and the executor still completes the tasks locally.

  1. Start yarnclient and establish a connection with ResourceManager, hereinafter referred to as RM
  2. Start ApplicaitonMaster, hereinafter referred to as AM
  3. Apply for a fixed-size container through AM, distribute the jar file, and start the java process of the executor in the container
  4. executor and scheduler establish heartbeat through RPC, report current status, task load, and prepare to respond to task request
  5. The executor gets the task request, starts the task execution through the local thread pool and reports the task status to the scheduler

优点:

  • 该方案只是通过 yarn 来申请executor的资源, 核心部分任务的执行和状态管理还是由框架本身来完成,扩展性强,以后很容易实现基于 standlone,mesos,kubernetes等形式的部署方案。
  • executor 容器的启动以后,任务请求通过本地线程启动,速度快。

缺点:

  • 工程复杂,需要完成scheduler和executor的通信协议,任务的分布式下发等等
  • executor需要独占一个yarn 容器,在没有任务时比较浪费资源。也可以实现动态分配 executor的容器, 但也会出现资源紧张时不能及时分配的情况
  • 资源的约束只能精确到 executor级别, 任务级别的资源隔离和约束需要 executor 来实现

Advantages:

  • This solution only uses yarn to apply for executor resources. The execution of tasks and state management are still done by the kun itself, which is the core of the framework. It has strong scalability and is easy to implement deployment solutions based on standlone, mesos, kubernetes, etc. in the future.
  • After the executor container is started, the task request is started through the local thread, which is fast.

Disadvantages:

  • This solution is complex, and it is necessary to complete the communication protocol between the scheduler and the executor, the distributed delivery of tasks, etc.
  • The executor needs to monopolize a yarn container, which wastes resources when there are no tasks. It is also possible to dynamically allocate executor containers, but there will also be situations where resources cannot be allocated in time when resources are tight.
  • Resource constraints can only be accurate to the executor level, and task-level resource isolation and constraints need to be implemented by executor

Additional context
Add any other context or screenshots about the feature request here.

@dyng
Copy link
Contributor

dyng commented Aug 28, 2020

我觉得应该有YarnClusterExecutor(实现Executor接口)和KunExecutorAppMaster

KunExecutorAppMaster是常驻的AppMaster,负责资源管理和任务运行。YarnClusterExecutor负责KunExecutorAppMaster的启动和初始化,以及任务的提交,状态更新等。

@JoshOY JoshOY added good first issue Good for newcomers help wanted Extra attention is needed labels Oct 15, 2020
@JoshOY JoshOY added the design Design ideas & wanted features, etc. label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design ideas & wanted features, etc. good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants