Skip to content

Commit

Permalink
v4.3提交,删除多余工具包,优化手动执行机制,详情请看readme文档
Browse files Browse the repository at this point in the history
  • Loading branch information
zhaoyachao committed Jan 2, 2021
1 parent 9dca0a4 commit 1142111
Show file tree
Hide file tree
Showing 48 changed files with 1,540 additions and 2,980 deletions.
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@

# READ MORE
[English description](README_en.md)

# 数据采集,处理,监控,调度,管理一体化平台

# 提示
Expand Down Expand Up @@ -122,6 +126,19 @@
+ v4.1 重新实现时间选取机制,调度性能提高
+ v4.1 修改监控界面去除组任务监控按钮添加子任务监控界面

+ v4.2 修复4.1bug
+ v4.2 sql脚本增加中文说明

+ v4.3 增加部分子任务重试机制
+ v4.3 增加手动执行部分子任务机制
+ v4.3 增加子任务运行时依赖图展示
+ v4.3 任务状态增加跳过状态
+ v4.3 优化logback日志配置
+ v4.3 修复手动重试新版bug
+ v4.3 删除zookeeper工具包
+ v4.3 删除[重复执行]调度任务模式
+ v4.3 删除mq配置信息
+ v4.3 删除netty工具包

# FAQ
+ 日志级别修改
Expand Down Expand Up @@ -159,7 +176,6 @@
# 支持的调度器模式
+ 时间序列(时间限制,次数限制)
+ 单次执行
+ 重复执行(次数限制,时间限制)

# 支持调度动态日期参数
详见说明文档
Expand Down Expand Up @@ -196,14 +212,15 @@
打包命令 mvn package -Dmaven.test.skip=true

# 运行
在target 目录下找到zdh.jar
执行 java -Dfile.encoding=utf-8 -jar zdh.jar
在target 目录下找到zdh_web.jar
执行 java -Dfile.encoding=utf-8 -jar zdh_web.jar

# 版本计划
+ 1.1 计划支持FTP 调度
+ 1.1 计划支持HFILE 直接读取功能
+ 1.1 docker 部署
+ 2.X 单任务多数据源处理
+ 5.X 计划支持kerberos 认证



Expand Down
206 changes: 206 additions & 0 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Data acquisition, processing, monitoring, scheduling, management integration platform

# topic

ZDH consists of two parts: front-end configuration + back-end data ETL processing. This part only includes front-end configuration
Please see the project https://github.com/zhaoyachao/zdh_server.git backend data etl
If zdh_Web chooses version 1.0,zdh_server is compatible with 1.x
Please select the Dev branch for secondary development. The DEV branch will merge the master only if the test passes, so the master may not be up to date, but the availability can be guaranteed

# features
Out of the box
Support for multiple data sources
High performance data acquisition
Separate scheduler, scheduling can also and three sides scheduler docking airflow, azkaban
Secondary development


# Usage scenarios
+ Data acquisition (local upload data, HDFS, JDBC, HTTP, Cassandra, directing, redis, kafka, hbase, es, SFTP, hive)
+ Data encryption
+ Data conversion, offline data synchronization, real-time data synchronization
+ Data migration
+ Quality inspection
+ Metadata, index management
+ Drools flexible and dynamic data cleansing



# The main function
The main function of ZDH is to pull data from HDFS, Hive, JDBC, HTTP-JSON and other data sources, and transfer data to HDFS, Hive, JDBC and other data sources
Supports cluster deployment


+ Support for SQL standard functions
+ Support interface selection configuration
+ supports fast replication of existing tasks
+ Support for external scheduling tools (need to modify, add a specific interface)
+ Elastic extension (single or clustered)
+ Support customer level permissions
+ Easy to use support secondary development
+ with its own simple scheduling tool, can be configured timing tasks, time series tasks, set times
+ Scheduling dependency
+ SQL Data warehouse data processing (single warehouse)
+ Quality inspection and corresponding report
+ Support SHELL command,SHELL script,JDBC query scheduling,HDFS query scheduling
+ Support local upload, download files
+ supports multi-source ETL
+ Task monitoring
+ Flexible dynamic Drools rule cleaning

# Functional diagram
![Functional diagram](img/zdh_web.png)

# Version update instructions
+ v1.0 support JDBC, commonly used data hive, kafka, HTTP, flume, redis, es, kudu, directing, hbase, Cassandra, HDFS (CSV, json, orc, parquet, XML, excel...)., local upload data (CSV)
+ V1.0 scheduling supports task dependencies, etc

+ V1.1 supports Clickhouse-JDBC

+ V1.2 supports external JAR ETL tasks (task status requires external JARS to be tracked by themselves)

+ V1.3 supports Drools data cleansing

+ V1.4 supports Greenplum - JDBC

+ V2.0 removes external JAR task and replaces it with SSH task, and SSH task function is newly added
+ V2.0 Drools tasks added support for multi-source and SQL tasks
+ V2.0 Clickhouse, Hive Spark data source optimization
+ V2.0 Spark SFTP data framework changes to add SFTP Excel and multi-delimiter support
+ V2.0 scheduling retry mechanism optimization, add node failure resend function (task restart)
+ V2.0 adds the dispatching individual alarm mechanism
+ V2.0 Server module high availability mechanism changed to load high availability
+ V2.0 Hbase, Drools JAR conflict bug fix
+ V2.0 supports SSH task static script, dynamic script
+ V2.0 Kafka, Flume Real-time data source removal must use JDBC output source restrictions
+ V2.0 fixed spark monitoring bug, mobile Spark monitoring to the general monitoring
+ V2.1 Zdh_Web adds Redis Cluster support
+ v2.1 increase support JDBC presto, mariadb, memsql, huawei DWS, ali AnalyticDB, kylin, gbase, kingbase, redshift

+ V2.2 scheduling mechanism adds ACK, no sense of failover
+ V2.2 Optimize all front interface, add status highlight
+ V2.2 SQL editing supports highlighting
+ V2.2 Manual execution scheduling changed to asynchronous execution
+ V2.2 Task log fetch mode changed (time fetch changed to identifier + time fetch)

+ Visual optimization of front-end interface status of V3.0
+ V3.0 removes the task_logs task log table and adds the task_log_instance table as a replacement (Big Change 2.0 and 3.0 are completely incompatible)
+ V3.0 fixes favicon display bugs
+ V3.0 added scheduler failover
+ V3.0 monitoring interface added manual retry function
+ V3.0 split quartz_job_info dispatching task table. Each time the dispatching task is executed, an instance table of the current state will be generated (retry, failover, ACK, etc., all complete logical operations based on the instance table).
+ V3.0 adds the interface of single task parallel processing mechanism (only the implementation interface is left, but no concrete implementation is done, so single task parallel processing is not supported temporarily)
+ V3.0 ZDH_Web project to add scheduler ID (mainly used for failover to determine whether the task is triggered by failover)
+ V3.0 manually delete reset (an instance will be generated, so remove reset)
+ V3.0 manual execution, dispatching execution will remove the previous instance dependency (after manual execution, the correct scheduling time must be set manually)
+ V3.0 adds the function of killing back-end tasks of data acquisition
+ V3.0 adds a timeout warning task - only warning without killing
+ V3.0 Hbase removes Jersey related jars to resolve JAR conflicts
+ V3.0 batch delete add confirm delete pop-up
+ V3.0 modifies the Spark Task Job group specification format

+ V3.1 task dependency check implementation changes
+ V3.1 adds cron expression to generate pages
+ V3.1 filenames add dynamic Settings (generate specific rule filenames based on time)
+ V3.1 Basic parameter calibration at startup
+ V3.1 Quartz task priority setting
+ V3.1 added support for Quartz time trigger (both serial and parallel support)
+ New DAG tool class in V3.1 (Plan 3.2 supports DAG scheduling)

+ V4.0 to achieve scheduling drag and drop
+ V4.0 reimplements the task discovery mechanism
+ V4.0 adds task group, subtask concept to realize group task
+ V4.0 to achieve DAG scheduling
+ V4.0 to achieve the scheduling flow chart
+ V4.0 re-implement task type (4.x previous version is not compatible)
+ V4.0 adds greenplum- Spark connector
+ V4.0 brand new logic, worth a try

+ V4.1 Fixed the 4.0 schedule retry bug
+ V4.1 Fixed a bug in the 4.0 scheduling interface
+ V4.1 Time selection mechanism is re-implemented, and scheduling performance is improved
+ V4.1 Modify the monitoring interface remove group task monitoring button add subtask monitoring interface

+ V4.2 Fixed 4.1bug
+ V4.2 SQL script to add Chinese description

+ V4.3 Add molecular task retry mechanism
+ V4.3 Add the mechanism of manual execution of molecular tasks
+ V4.3 Increase subtask runtime dependency graph display
+ V4.3 Task state increases skip state
+ V4.3 Fix manual retry bug
+ V4.3 Remove the ZooKeeper toolkit
+ V4.3 Removes the [repeat execution] scheduling task mode
+ V4.3 Delete the MQ configuration information

# FAQ
+ Log level modification
Modify the logback correlation level of the log file

+ Schedule serial parallel mode
Serial mode: determines the last run status of the task
Parallel mode: do not determine the last task status, the time will be automatically generated

+ data table structure in the SRC/main/resources/db SQL shall prevail

+ Hadoop, Hive and Hbase services that do not support reading Kerberos authentication for the time being are expected to implement Kerberos authentication in version 5.x


# Supported data sources
+ local file
+ hive(A single cluster uses multiple remote Hive, as well as internal and external tables)
+ hdfs(csv,txt,json,orc,parquet,avro)
+ jdbc (All the JDBC, contain special JDBC as hbase - phoenix, spark - JDBC, click - house, greenplum, presto, mariadb, memsql, huawei DWS, ali AnalyticDB, kylin, gbase, kingbase and redshift)
+ hbase
+ mongodb
+ es
+ kafka
+ http
+ sftp
+ cassandra
+ redis
+ flume


# Source custom packaging

Clean up the command :mvn clean
Packaging orders : mvn package -Dmaven.test.skip=true

# how to run
Find zdh_web.jar in the target directory
perform java -Dfile.encoding=utf-8 -jar zdh_web.jar





# Personal Contact information
email:[email protected]

# The online preview
http://zycblog.cn:8081/login
account:zyc
password:123456

Server resources are limited, the interface is only for preview, do not include data processing part, thank code friends leniency

# 界面预览

![登陆界面](img/login.jpg)

![功能预览](img/index.jpg)

![数据源界面](img/sources_list.jpg)

![增加数据源界面](img/sources_add.jpg)

![ETL任务界面](img/etl_list.jpg)

![ETL任务配置界面](img/etl_add.jpg)

![调度任务界面](img/dispatch_list.jpg)

![调度任务配置界面](img/dispatch_add.jpg)


14 changes: 2 additions & 12 deletions release/conf/application-dev.properties
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@ server.port = 8081

web.path =/WEB-INF/zdh/

udp.ip=127.0.0.1
udp.port=8766
tcp.ip=127.0.0.1
tcp.port=8765
#用户根据自己要求使用对应的日志配置
logging.config=../conf/logback.xml

spring.http.multipart.maxFileSize = 300Mb
spring.http.multipart.maxRequestSize=500Mb
Expand Down Expand Up @@ -48,14 +46,6 @@ spring.thymeleaf.cache=false

logging.level.root=info

## URL of the ActiveMQ broker. Auto-generated by default. For instance `tcp://localhost:61616`
spring.activemq.url=failover:(tcp://127.0.0.1:61616)
# tcp://localhost:61616
#spring.activemq.broker-url=tcp\://localhost\:61616
spring.activemq.in-memory=true
#spring.jms.pub-sub-domain=true #start topic
spring.activemq.pool.enabled=false

#redis ------start------- redis cluster 模式 hostName 使用ip1:port1,ip2:port2 格式 逗号分隔
spring.redis.hostName=127.0.0.1
spring.redis.port=6379
Expand Down
14 changes: 2 additions & 12 deletions release/conf/application-pro.properties
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,8 @@ server.port = 8081

web.path =/WEB-INF/zdh/

udp.ip=127.0.0.1
udp.port=8766
tcp.ip=127.0.0.1
tcp.port=8765
#用户根据自己要求使用对应的日志配置
logging.config=../conf/logback.xml

spring.http.multipart.maxFileSize = 300Mb
spring.http.multipart.maxRequestSize=500Mb
Expand Down Expand Up @@ -48,14 +46,6 @@ spring.thymeleaf.cache=false

logging.level.root=info

## URL of the ActiveMQ broker. Auto-generated by default. For instance `tcp://localhost:61616`
spring.activemq.url=failover:(tcp://127.0.0.1:61616)
# tcp://localhost:61616
#spring.activemq.broker-url=tcp\://localhost\:61616
spring.activemq.in-memory=true
#spring.jms.pub-sub-domain=true #start topic
spring.activemq.pool.enabled=false

#redis ------start------- redis cluster 模式 hostName 使用ip1:port1,ip2:port2 格式 逗号分隔
spring.redis.hostName=127.0.0.1
spring.redis.port=6379
Expand Down
10 changes: 10 additions & 0 deletions release/conf/logback.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger - %msg %n</pattern>
<charset>UTF-8</charset>
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>info</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
</appender>
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
Expand All @@ -21,6 +26,11 @@
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{35} - %msg %n</pattern>
<charset>UTF-8</charset> <!-- 此处设置字符集 -->
</encoder>
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>debug</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
</appender>

<logger name="com.zyc.zdh.shiro" level="info" />
Expand Down
10 changes: 4 additions & 6 deletions src/main/java/com/zyc/zdh/controller/ZdhDispatchController.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,7 @@
import java.lang.reflect.Field;
import java.sql.Timestamp;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Calendar;
import java.util.Date;
import java.util.List;
import java.util.*;

@Controller
public class ZdhDispatchController extends BaseController {
Expand Down Expand Up @@ -247,9 +244,10 @@ public String dispatch_task_update(QuartzJobInfo quartzJobInfo) {
*/
@RequestMapping("/dispatch_task_execute")
@ResponseBody
public String dispatch_task_execute(QuartzJobInfo quartzJobInfo, String reset_count,String concurrency,String start_time,String end_time) {
public String dispatch_task_execute(QuartzJobInfo quartzJobInfo, String reset_count,String concurrency,String start_time,String end_time,String[] sub_tasks) {
debugInfo(quartzJobInfo);
System.out.println(concurrency);
System.out.println(Arrays.toString(sub_tasks));
JSONObject json = new JSONObject();

try {
Expand Down Expand Up @@ -290,7 +288,7 @@ public String dispatch_task_execute(QuartzJobInfo quartzJobInfo, String reset_co
}
}
tglim.insert(tgli);
JobCommon2.sub_task_log_instance(tgli);
JobCommon2.sub_task_log_instance(tgli,sub_tasks);
}

tglim.updateStatus2Create(tgli_ids.toArray(new String[]{}));
Expand Down
Loading

0 comments on commit 1142111

Please sign in to comment.