forked from sogou/workflow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request sogou#25 from holmes1412/master
Update English version README.md
- Loading branch information
Showing
2 changed files
with
90 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,76 +1,52 @@ | ||
[中文版](README.md) | ||
|
||
## Sogou C++ Workflow | ||
[](https://git.sogou-inc.com/wujiaxu/Filter/blob/master/LICENSE) | ||
[](https://en.cppreference.com/) | ||
[](#%E9%A1%B9%E7%9B%AE%E7%9A%84%E4%B8%80%E4%BA%9B%E8%AE%BE%E8%AE%A1%E7%89%B9%E7%82%B9) | ||
|
||
# Sogou C++ Workflow | ||
#### As the backend C++ programming standard in Sogou, Workflow is an industrial-grade programming engine. | ||
#### Main functions and features: | ||
* An **asynchronous engine** based on **C++11** ``std::function`` which aims to solve all the **serial, parallel and asynchronous** problems. | ||
* As a network framework, it is completely **protocol-agnostic** and directly facing applications. | ||
* It can either be used as a Redis **client** or an Http **server**. | ||
* Convenient to **customize protocols**, so you can quickly build your own RPC systems. | ||
* Sogou RPC is developed based on Sogou Workflow and is open source as an independent project. The project supports srpc, brpc and thrift protocols ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)). | ||
* Support **SSL** (depends on openssl). Support **TCP, UDP, SCTP** and other common transport layer protocols. Support SSL on **SCTP**. Not support UDP server. | ||
* Natively contains a variety of **common Internet protocol** implementations which are used in a unified way. | ||
* Currently support **http, redis, mysql** and **kafka** protocols. You can directly access these resources or build **servers** for these protocols. | ||
* Highly likely the only C++ full-featured **mysql asynchronous client** on the market. | ||
* **DNS** protocol is being developing and currently we use the system library to access DNS. | ||
* Powerful feature for **scheduling computing tasks** | ||
* Computing task, as well as communication task, can be added into the task flow and they’re scheduled separately by their corresponding scheduler. | ||
* You can use it as a parallel programming engine **without** the network features. | ||
* Our biggest goal is to **maximize the performance** of every node when the calculation and communication environment is very complex. | ||
* Some **common algorithm** implementations are provided, such as parallel sorting and MapReduce. | ||
* In fact, all asynchronous processes (such as disk IO, GPU tasks, timers, etc.) **can be scheduled in coordination**. | ||
* On the Linux system, the disk IO task is realized through the Linux underlying aio, which is extremely efficient. | ||
* Support any task flow with **DAG** structure. However, in most cases, users only need **series-parallel** structure. | ||
* Built-in **load balancing** and powerful **service governance** features. | ||
* Easily used in conjunction with other asynchronous engines. | ||
* **Streaming** communication engine is being developed. | ||
* When working as a server, it supports **multi-processes** mode and supports precise **graceful restart**. | ||
|
||
#### Building | ||
* Support **Linux, macOS, FreeBSD, Windows** and other systems so far. Installing **cmake** is necessary. | ||
* Windows version is temporarily released as an independent [branch](https://github.com/sogou/workflow/tree/windows), which uses **iocp** as the basis for asynchronous communication and mean while, keeping the **same external interface**. | ||
* As written in C/C++, it requires the users being able to proficiently use C++ programing. It **does not** rely on boost or asio, therefor the compiling speed is extremely fast. | ||
* It contains a few C++11 features, so users should be able to use ``std::function`` and ``std::move``. | ||
* Theoretically support all CPU architectures and can be compiled and run on **32-bit** or **64-bit arm processors**. Big endian CPU is not tested. | ||
* **OpenSSL** is required. If users expect high performance of SSL, OpenSSL 1.1 or higher is strongly recommended. | ||
* **No other dependencies**. Several compression libraries such as snappy and lz4 is contained by their unmodified source (required by the Kafka protocol). | ||
|
||
#### Some design features | ||
* The basic usage is very simple and handy. Some features are designed to greatly reduce the difficulty of programming with general C++ projects. | ||
* To **avoid** users to **derive** as much as possible, all user behaviors are wrapped with std::function, for example: | ||
* the **callback** after every task ends | ||
* the **algorithms** in computing tasks | ||
* one server corresponds to one ``std::function`` | ||
* Trying to avoid complicated memory management, all tasks and frameworks are generated by **factory** classes, and their memory is **recycled automatically**. Which means, | ||
* Every task is **automatically deleted after its callback**. | ||
* If the users want to keep any data in the task (such as a network reply packet or the result of an algorithm), they need to use ``std::move`` to move it. | ||
* We treat memory recycle as a strict and naturally logical mechanism, so we **don’t** use share_ptr. | ||
* Avoid using complicated parameter configuration. | ||
* Actually we have a lot of **configurable parameters**, though you can use our system **without** feeling the existence of parameters. | ||
* If you have specific requirements for program behavior and resource ratio, you can definitely find the corresponding parameter configuration items in order to maximize the performance of you program. | ||
* The project adopts a fully asynchronous design and is not transparent to users, which means users need to know that they are writing asynchronous programs. | ||
* Thanks to the convenience brought by ``std::function`` and the automatic memory recycling mechanism, we have delicately designed **the simply possible usage of asynchrony** for users. | ||
* **No** user-mode threads concepts. On the one hand, performance is considered. On the other hand, we have the concept of computing tasks (threaded tasks) scheduling. | ||
* In our design, **computing** is one kind of **asynchronous task**, which has no differences from communication. | ||
* Computing tasks are scheduled by **independent thread groups** according to specific algorithms, please note that they **may not** be executed **immediately**. | ||
* As we have such computing tasks, user-mode threads become meaningless, and therefore users must understand asynchrony. | ||
* Because of the full asynchrony, almost all core calls are **short** and **non-blocking** operations. | ||
* That’s why we **don’t** recommend users to **block** their programs in callback or do some complex calculations. However, it acceptable if the logic is quite simple. | ||
* Brief summary of the usage: | ||
* The users can build the program just like building a **series-parallel** circuit. The circuit can be generated **at the beginning** or **dynamically generated during the program running**. | ||
* **We provide various electronic components** for users. For instance, one http request, one GPU matrix multiplication, and one parallel sorting can all be understood as a electronic component. | ||
* Every electronic component has its **standard input and output**. At the meantime, every electronic component can be a **complicated circuit**, which has no necessary to be perceived by the users. | ||
* For example, an http request may go through **multiple asynchronous processes** such as DNS, redirect, and retry, but the entire processes is just a **component** in the perspective of the users. | ||
* Users can easily **define their own** components, including algorithms and some kind of communication. | ||
* To implement **stateless protocols** is extremely simple. It may be a little bit complicated when the protocol includes login, library selection, etc., at this time, you can refer to the redis implementation. | ||
* Through the powerful Upstream system, complex **service governance** can be realized, such as communication node selection, load balancing, circuit breaker and recovery, master and slave, etc. | ||
* **In conclusion, this is an enterprise-level, elegantly designed asynchronous framework which can cover almost all high-performance back-end service requirements.** | ||
|
||
#### Tutorials: | ||
As **Sogou`s C++ server engine**, workflow supports almost all **back-end C++ online services** of Sogou, including all search services, cloud input method,online advertisements, etc., handling more than **10 billion** requests every day. This is an **enterprise-level programming engine** with light and elegantly designed which can satisfy most C++ back-end development requirements. | ||
|
||
#### You can use it: | ||
* To quickly build an **Http server**: | ||
~~~cpp | ||
#include <stdio.h> | ||
#include "workflow/WFHttpServer.h" | ||
|
||
int main() | ||
{ | ||
WFHttpServer server([](WFHttpTask *task) { | ||
task->get_resp()->append_output_body("<html>Hello World!</html>"); | ||
}); | ||
|
||
if (server.start(8888) == 0) { // start server on port 8888 | ||
getchar(); // press "Enter" to end. | ||
server.stop(); | ||
} | ||
|
||
return 0; | ||
} | ||
~~~ | ||
* As a **powerful asynchronous client**. Currently supports ``http``, ``redis``, ``mysql`` and ``kafka`` protocols. | ||
* To realize **user-defined protocol client/server** and build your own **RPC system**. | ||
* Sogou RPC is based on it and open source as an independent project, which supports srpc, brpc and thrift protocol ([benchmark](https://github.com/holmes1412/sogou-rpc-benchmark)). | ||
* To build **asynchronous task flow**, support common **series** and **parallel** structures, and also support more complex **DAG** structures. | ||
* As a **parallel programming tool**. In addition to **network tasks**, we also include **the scheduling of computing tasks**. All types of tasks can be put into **the same** task flow. | ||
* As a **file asynchronous IO tool** under ``Linux`` system, with a high performance exceeding any system call. Disk IO is also a task. | ||
* To realize any **high-performance** and **high-concurrency** back-end service with a very complex relationship between computing and communication. | ||
* To build a **service mesh** system. | ||
* The project has built-in **service governance** and **load balancing** features. | ||
|
||
#### Compile and run environment | ||
|
||
* This project supports ``Linux``, ``macOS``, ``Windows`` and other operating systems. | ||
* ``Windows`` version is temporarily released as an independent branch, using ``iocp`` to implement asynchronous networking. All user interfaces are consistent with the ``Linux`` version. | ||
* Supports all CPU platforms, including 32 or 64-bit ``x86`` processors, big-endian or little-endian ``arm`` processors. | ||
* Relies on ``OpenSSL``, recommending ``OpenSSL 1.1`` and above. | ||
* Uses the ``C++11`` standard and therefore, needs to be compiled with a compiler which supports ``C++11``. Does not rely on ``boost`` or ``asio``. | ||
* No other dependencies. However, it contains the unmodified source code of several compression libraries such as ``lz4``, ``zstd`` and ``snappy`` (required by the ``Kafka`` protocol). | ||
|
||
# Try it! | ||
* Client | ||
* [Create your first task:wget](docs/tutorial-01-wget.md) | ||
* [Implement redis set and get:redis_cli](docs/tutorial-02-redis_cli.md) | ||
|
@@ -83,6 +59,7 @@ | |
* Important topics | ||
* [About error](docs/about-error.md) | ||
* [About timeout](docs/about-timeout.md) | ||
* [About global configuration](docs/about-config.md) | ||
* [About DNS](docs/about-dns.md) | ||
* [About exit](docs/about-exit.md) | ||
* Computing tasks | ||
|
@@ -103,10 +80,52 @@ | |
* Built-in protocols | ||
* [Asynchronous MySQL client:mysql_cli](docs/tutorial-12-mysql_cli.md) | ||
|
||
#### System design features | ||
|
||
We believe that a typical back-end program consists of the following three parts and should be developed completely independently. | ||
* Protocol | ||
* In most cases, users use built-in common network protocols, such as http, redis or various rpc. | ||
* Users can also easily customize user-defined network protocol, at the mean time they only need to provide serialization and deserialization functions to define their own client/server. | ||
* Algorithm | ||
* In our design, algorithm is a symmetrical concept with protocol. | ||
* If protocol call is rpc, then algorithm call is an apc (Async Procedure Call). | ||
* We have provided some general algorithms, such as sort, merge, psort, reduce, which can be used directly. | ||
* Compared with user-defined protocol, user-defined algorithm is much more common. Any complex calculation with clear boundaries should be packaged into an algorithm. | ||
* Task flow | ||
* Task flow is the actual bussiness logic, which is to put the protocols and algorithms into the flow graph for use. | ||
* The typical task flow is a closed series-parallel graph. Complex business logic may be a non-closed DAG. | ||
* The task flow graph can be constructed directly or dynamically generated based on the results of each step. All tasks are executed asynchronously. | ||
|
||
Basic task, task factory and complex task | ||
* Our system contains six basic tasks: communication, file IO, CPU, GPU, timer, and counter. | ||
* All tasks are generated by the task factory and automatically recycled after callback. | ||
* Server task is one kind of special communication task, generated by the framework which calls the task factory, and handed over to the user through the process function. | ||
* In most cases, the task generated by the user through the task factory is a complex task, which has no necessary to be perceived by the user. | ||
* For example, an Http request may include many asynchronous processes (DNS, redirection), but for the user, it is just a communication task. | ||
* File sorting seems to be an algorithm, but it actually includes many complex interaction processes between file IO and CPU calculation. | ||
* If you think of business logic as building circuits with well-designed electronic components, then each electronic component may be a complex circuit. | ||
|
||
Asynchrony and encapsulation based on ``C++11 std::function`` | ||
|
||
* Not based on user mode coroutines. Users need to know that they are writing asynchronous programs. | ||
* All calls are executed asynchronously, and there are almost no operations to wait for threads. | ||
* Although we also provide some convenient semi-synchronous interfaces, they are not core features. | ||
* Please avoid derivation.Try to encapsulate user behavior with ``std::function`` instead, including: | ||
* The callback of any task. | ||
* Any server process. This conforms to the ``FaaS`` (Function as a Service) idea. | ||
* The realization of an algorithm is simply a ``std::function``. But the algorithm can also be implemented by derivative. | ||
|
||
Memory reclamation mechanism | ||
* Every task will be automatically reclaimed after the callback. If a task is created but does not want to run, the user needs to release it through the dismiss method. | ||
* Any data in the task, such as the response of the network request, will also be recycled with the task. At this time, the user can use ``std::move()`` to move the required data. | ||
* SeriesWork and ParallelWork are two kinds of framework objects, which are also recycled after their callback. | ||
* This project doesn’t use ``std::shared_ptr`` to manage memory. | ||
|
||
#### More design documents | ||
To be continued... | ||
|
||
## Authors | ||
|
||
* **Xie Han** - *[[email protected]](mailto:[email protected])* | ||
* **Wu Jiaxu** - *[[email protected]](mailto:[email protected])* | ||
* **Li Yingxin** - *[[email protected]](mailto:[email protected])* | ||
|
||
|