[CELEBORN-824][DOC] Add PushData document

### What changes were proposed in this pull request? As title. ### Why are the changes needed? As title ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No. Closes apache#1747 from waitinfuture/824. Authored-by: zky.zhoukeyong <[email protected]> Signed-off-by: zky.zhoukeyong <[email protected]>
GH-Gloway · Jul 24, 2023 · 8e84964 · 8e84964
1 parent 2752154
commit 8e84964
Show file tree

Hide file tree

Showing 5 changed files with 124 additions and 6 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -96,7 +96,7 @@ cd $SPARK_HOME
 --conf spark.shuffle.service.enabled=false
 ```
 Then run the following test case:
-```shell
+```scala
 spark.sparkContext.parallelize(1 to 10, 10)
   .flatMap( _ => (1 to 100).iterator
   .map(num => num)).repartition(10).count

diff --git a/docs/assets/img/softsplit.svg b/docs/assets/img/softsplit.svg
diff --git a/docs/developers/overview.md b/docs/developers/overview.md
@@ -23,15 +23,15 @@ please refer to dedicated articles.
 In distributed compute engines, data exchange between compute nodes is common but expensive. The cost comes from
 the disk and network inefficiency (M * N between Mappers and Reducers) in traditional shuffle frame, as following:
 
-![ESS](/assets/img/ess.svg)
+![ESS](../../assets/img/ess.svg)
 
 Besides inefficiency, traditional shuffle framework requires large local storage in compute node to store shuffle
 data, thus blocks the adoption of disaggregated architecture.
 
 Apache Celeborn(Incubating) solves the problems by reorganizing shuffle data in a more efficient way, and storing the data in
 a separate service. The high level architecture of Celeborn is as follows:
 
-![Celeborn](/assets/img/celeborn.svg)
+![Celeborn](../../assets/img/celeborn.svg)
 
 ## Components
 Celeborn(Incubating) has three primary components: Master, Worker, and Client.
@@ -73,8 +73,8 @@ it just needs one network connection and sequentially read the coarse grained fi
 In abnormal cases, such as when the file grows too large, or push data fails, Celeborn spawns a new split of the
 `PartitionLocation`, and future data within the partition will be pushed to the new split.
 
-Client keeps the split information and tells reducer to read from all splits of the `PartitionLocation` to guarantee
-no data is lost.
+`LifecycleManager` keeps the split information and tells reducer to read from all splits of the `PartitionLocation`
+to guarantee no data is lost.
 
 ## Data Storage
 Celeborn stores shuffle data in configurable multiple layers, i.e. `Memroy`, `Local Disks`, `Distributed File System`,