Skip to content

Commit

Permalink
add jstorm-doc
Browse files Browse the repository at this point in the history
  • Loading branch information
玖条 committed Jul 17, 2016
1 parent afb0d26 commit cc1256a
Show file tree
Hide file tree
Showing 113 changed files with 5,698 additions and 0 deletions.
89 changes: 89 additions & 0 deletions docs/jstorm-doc/FAQ/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: FAQ
layout: plain
---
## Performance issues
Reference Performance Optimization

## Lack of resources
When the report "No supervisor resource is enough for component", it means that the resource is not enough, if only the test environment can be supervisor of cpu and memory slot set bigger.

In jstorm in a task default consume a cpu slot and a memory slot, and a machine default cpu slot number is cpu core -1, memory slot number is the physical memory size * 75% / 1g, if a run on worker task more, you need to set smaller memory slot size (default is 1G), such as 512M, memory.slot.per.size: 536870912 bytes.

```
#if it is null, then it will be detect by system
supervisor.cpu.slot.num: null
#if it is null, then it will be detect by system
supervisor.mem.slot.num: null
# support disk slot
# if it is null, it will use $(storm.local.dir)/worker_shared_data
supervisor.disk.slot: null
```

## Serialization issues
All spout, bolt, configuration, message (Tuple) sent must implement Serializable, otherwise there will be a error of serialization.

When the spout or bolt if it is a member variable does not implement serializable, but when you have to use, you can increase the "transient" modifier when declaring variables, and instantiated when you open or prepare .

## Log4j conflict
From 0.9.0, JStorm still use Log4J, but the storm using Logbak, so the application if there are dependent log4j-over-slf4j.jar, you need to exclude all log4j-over-slf4j.jar dependence, the next version will have a custom classloader, do not worry about this problem.

```
SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError.
SLF4J: See also
http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.log4j.Logger.getLogger(Logger.java:39)
at org.apache.log4j.Logger.getLogger(Logger.java:43)
at com.alibaba.jstorm.daemon.worker.Worker.<clinit>(Worker.java:32)
Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError. See also
http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
at org.apache.log4j.Log4jLoggerFactory.<clinit>(Log4jLoggerFactory.java:49)
... 3 more
Could not find the main class: com.alibaba.jstorm.daemon.worker.Worker. Program will exit.
```

## Class conflict
If the application and JStorm use the same jar package, but not the same version, it is recommended to modify the configuration file, open classloader

```
topology.enable.classloader: true
```
Or

```
ConfigExtension.setEnableTopologyClassLoader(conf, true);
```

JStorm default is to turn off the classloader, therefore JStorm will be forced to use JStorm dependent jar

## After submitting the task, and after waiting a few minutes, web ui has not display the corresponding task
three kinds of situations:

### User application initialization is too slow
If the user application has log output, it indicates that the initialization of the application is too slow or error, you can view the log. In addition to MetaQ 1.x applications, Spout will recover ~ /.meta_recover/ directory files, you can delete these files, acceleration starts.

### Usually the user jar conflict or a problem with the initialization
Open supervisor logs, to identify start worker command, executed individually, and then check if there are problems. Similar to the following:

![fail_start_worker]({{site.baseurl}}/img/FAQ/fail_start_worker.jpg)

### Check is not the same storm and jstorm local directory
Check the configuration items "storm.local.dir", storm and jstorm whether to use the same local directory, if the same, to a different directory

## port has been bound
two kinds of situations:

### More than one worker to seize a port
Assuming 6800 port is occupied, you can execute the command "ps -ef | grep 6800" to check whether there are multiple processes, if there are multiple processes, kill them manually

### open too many connections
Linux exists the external connection port limit, TCP client initiates a connection outside reached about 28,000, began to throw a lot of exceptions, you need to modify the external connection port restrictions
```
# echo "10000 65535" > /proc/sys/net/ipv4/ip_local_port_range
```


Other questions, You can enter the QQ group (228374502) for consultation
141 changes: 141 additions & 0 deletions docs/jstorm-doc/FAQ_cn/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
---
title: FAQ
layout: plain_cn
---
## 性能问题
参考性能优化

## 运行时topology的task列表中报"task is dead"错误
有几个原因可能导致出现这个错误:

1. task心跳超时,导致nimbus主动kill这个task所在的worker
2. task对应的 bolt/spout 中的open/prepare/execute/nextTuple等,没有对异常做try...catch,导致抛出异常,导致task挂掉。**这里要注意一下,一个worker中任意一个task如果没有做异常处理,会导致整个worker挂掉,会导致该worker中其他task也报Task is dead**,所以在jstorm的应用代码中,**强烈建议在所有的方法中都加上try...catch**

具体排查可以这么来做:

1. 如果task是每隔4分钟左右有规律地挂掉,那么基本可以确定是task心跳超时导致的,可以直接跳到3
2. 查看worker日志,在挂掉的时间点是否有异常。但是注意要看挂掉的那个worker的日志,而不是重新起来之后新的worker的日志,因为worker重新起来之后可能位于不同的机器上。
3. 如果worker日志没有异常,那么可以看一下集群nimbus的日志,搜一下:"Update taskheartbeat",然后找到挂掉的worker所对应的topology Id,看看最后更新心跳的时间是什么时候。对比一下task心跳超时的配置(nimbus.task.timeout.secs),如果worker挂掉的时间 - 最后一次更新心跳的时间 > task心跳超时,那么基本上可以确定是因为task心跳超时被kill了。这有几种可能:

* 执行队列被阻塞了,一直没有返回;
* worker发生了FGC,这会导致正常的线程都被停住,从而导致心跳超时。这时要查看一下对应的GC日志,看那个时间点附近有没有FGC;
* worker/task抛出了未处理的异常,如OutOfMemoryError之类的
* 最后也有可能是worker一直没起来, worker心跳超时

## [Netty-Client-boss-1] Failed to reconnect ...[15], /10.1.11.1:6801, channel, cause: java.net.ConnectException: Connection refused错误
这个日志一般只会在task/worker挂掉的时候才会出现,因为挂掉的worker对应的端口已经被释放,所以会出现连接拒绝。具体排查见上面的"task is dead“

## task报“queue is full”
JStorm bolt/spout 中有三个基本的队列: Deserialize Queue ---> Executor Queue ---> Serialize Queue。每一个队列都有满的可能。
如果是 serializeQueue is full,那么可能是序列化对象太大,序列化耗时太长。可以精简传输对象。
如果是deserialize queue is full, 或是execute queue is full。 2个原因都是一样的。都是下游bolt处理速度跟不上上游spout或bolt的发送速度。
解决办法:

1. 判断是不是一个常态问题以及是不是大面积发生,如果就1个或2个task出现,并且没有引起worker out of memory,其实是可以忽略的。
2. 如果一个component大面积发生task 队列满, 或因为task 满导致worker out of memory, 就需要解决处理速度更不上的问题。

怎么解决,请参考 `性能调优` 尤其是下游的bolt的处理能力提上来, 最简单的办法是增加并发, 如果增加并发不能解决问题, 请参考`性能调优`寻找优化点。

## 提交topology后task状态一直是starting
首先请到topology页面点task的worker log,看有没有日志

- 如果有worker log,请看看里面是否有异常。确认一下你的所有方法中,如open/prepare/execute/nextTuple中,有没有做try...catch,如果你抛出了异常并且没有做处理,jstorm默认就会认为这个worker有问题,这样会导致整个worker都挂掉了。
- 如果没有,则可能有以下几个原因:
- 你的topology请求的memory过多,导致分配不出需要的内存(包括:worker.memory.size配置,JVM参数中-Xmx -Xms等的配置)。
- supervisor机器的磁盘满了,或者其他机器原因。
- 还有一些常见的错误,如jvm参数设置不正确(比如-Xmn > -Xmx,使用了对应jdk不支持的JVM参数等);jar包冲突(如日志冲突)等

## 资源不够
当报告 ”No supervisor resource is enough for component “, 则意味着资源不够
如果是仅仅是测试环境,可以将supervisor的cpu 和memory slot设置大,

在jstorm中, 一个task默认会消耗一个cpu slot和一个memory slot, 而一台机器上默认的cpu slot是(cpu 核数 -1), memory slot数(物理内存大小 * 75%/1g), 如果一个worker上运行task比较多时,需要将memory slot size设小(默认是1G), 比如512M, memory.slot.per.size: 535298048

```
#if it is null, then it will be detect by system
supervisor.cpu.slot.num: null
#if it is null, then it will be detect by system
supervisor.mem.slot.num: null
#support disk slot
#if it is null, it will use $(storm.local.dir)/worker_shared_data
supervisor.disk.slot: null
```

## 提交topology时报:org.apache.thrift.transport.TTransportException: Frame size (17302738) larger than max length (16384000)!
这个问题的原因是序列化后的topology对象过大导致的,通常可能是你在spout/bolt中创建了一个大对象(比如bitmap, 大数组等),导致序列化后对象的大小超过了thrift的max frame size(thrift中16384000这个值是写死的,只能调小不能调大)。在JStorm中,如果需要在spout/bolt中创建大对象,建议是在open/prepare方法中来做,延迟对象的创建时间。参见:https://github.com/alibaba/jstorm/issues/230

## 序列化问题
所有spout,bolt,configuration, 发送的消息(Tuple)都必须实现Serializable, 否则就会出现序列化错误.

如果是spout或bolt的成员变量没有实现Serializable时,但又必须使用时,
可以对该变量申明时,增加transient 修饰符, 然后在open或prepare时,进行实例化

![seriliazble_error]({{site.baseurl}}/img/FAQ_cn/serializable_error.jpg)

## 日志冲突
JStorm 0.9.x系列使用log4j作为日志系统,2.x系列使用logback作为日志系统。
但是不管使用哪个版本的jstorm,都需要注意的是不能使用冲突的日志依赖。比如log4j-over-slf4j和slf4j-log4j12是冲突的,它们是肯定不能共存的,否则会出现类似这个错误:

```
SLF4J: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError.
SLF4J: See also
http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.log4j.Logger.getLogger(Logger.java:39)
at org.apache.log4j.Logger.getLogger(Logger.java:43)
at com.alibaba.jstorm.daemon.worker.Worker.<clinit>(Worker.java:32)
Caused by: java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError. See also
http://www.slf4j.org/codes.html#log4jDelegationLoop for more details.
at org.apache.log4j.Log4jLoggerFactory.<clinit>(Log4jLoggerFactory.java:49)
... 3 more
Could not find the main class: com.alibaba.jstorm.daemon.worker.Worker. Program will exit.
```

具体地来说,jstorm 0.9.x依赖了log4j, slf4j-log4j12,因此如果使用了0.9.x版本,你的应用代码中必须要排除掉log4j-over-slf4j的依赖。
同样地,jstorm 2.x依赖了logback, log4j-over-slf4j,如果使用了这个版本,你的应用代码中需要排除掉slf4j-log4j12的依赖。


## 类冲突
如果应用程序使用和JStorm相同的jar 但版本不一样时,建议打开classloader,
修改配置文件

```
topology.enable.classloader: true
```

或者

```
ConfigExtension.setEnableTopologyClassLoader(conf, true);
```

JStorm默认是关掉classloader,因此JStorm会强制使用JStorm依赖的jar

## 提交任务后,等待几分钟后,web ui始终没有显示对应的task
有3种情况:
* 用户程序初始化太慢
如果有用户程序的日志输出,则表明是用户的初始化太慢或者出错,查看日志即可。 另外对于MetaQ 1.x的应用程序,Spout会recover ~/.meta_recover/目录下文件,可以直接删除这些消费失败的问题,加速启动。

* 通常是用户jar冲突或初始化发生问题
打开supervisor 日志,找出启动worker命令,单独执行,然后检查是否有问题。类似下图:

![fail_start_worker]({{site.baseurl}}/img/FAQ_cn/fail_start_worker.jpg)

* 检查是不是storm和jstorm使用相同的本地目录
检查配置项 ”storm.local.dir“, 是不是storm和jstorm使用相同的本地目录,如果相同,则将二者分开

## 提示端口被绑定
有2种情况:
* 多个worker抢占一个端口
假设是6800 端口被占, 可以执行命令 “ps -ef|grep 6800” 检查是否有多个进程, 如果有多个进程,则手动杀死他们

* 系统打开太多的connection
Linux对外连接端口数限制,TCP client对外发起连接数达到28000左右时,就开始大量抛异常,需要

```
# echo "10000 65535" > /proc/sys/net/ipv4/ip_local_port_range
```

其他问题,可以入QQ群进行咨询228374502
34 changes: 34 additions & 0 deletions docs/jstorm-doc/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely need to edit after that.
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'jekyll serve'. If you change this file, please restart the server process.

# Site settings
title: JStorm Documentation
email: [email protected]
encoding: utf-8
description: > # this means to ignore newlines until "baseurl:"
Write an awesome description for your new site here. You can edit this
line in _config.yml. It will appear in your document head meta (for
Google search results) and in your feed.xml site description.
baseurl: "" # the subpath of your site, e.g. /blog
url: "http://yourdomain.com" # the base hostname & protocol for your site
twitter_username: jekyllrb
github_username: jekyll

# Build settings
markdown: kramdown
highlighter: pygments

layout:
values:
layout: plain

host: 127.0.0.1

kramdown:
input: GFM # GitHub syntax
hard_wrap: false # Don't translate new lines to <br>s
toc_levels: 1..3 # Include h1-h3 for ToC
38 changes: 38 additions & 0 deletions docs/jstorm-doc/_includes/footer.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<footer class="site-footer">

<div class="wrapper">

<h2 class="footer-heading">{{ site.title }}</h2>

<div class="footer-col-wrapper">
<div class="footer-col footer-col-1">
<ul class="contact-list">
<li>{{ site.title }}</li>
<li><a href="mailto:{{ site.email }}">{{ site.email }}</a></li>
</ul>
</div>

<div class="footer-col footer-col-2">
<ul class="social-media-list">
{% if site.github_username %}
<li>
{% include icon-github.html username=site.github_username %}
</li>
{% endif %}

{% if site.twitter_username %}
<li>
{% include icon-twitter.html username=site.twitter_username %}
</li>
{% endif %}
</ul>
</div>

<div class="footer-col footer-col-3">
<p>{{ site.description }}</p>
</div>
</div>

</div>

</footer>
12 changes: 12 additions & 0 deletions docs/jstorm-doc/_includes/head.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">

<title>{% if page.title %}{{ page.title | escape }}{% else %}{{ site.title | escape }}{% endif %}</title>
<meta name="description" content="{% if page.excerpt %}{{ page.excerpt | strip_html | strip_newlines | truncate: 160 }}{% else %}{{ site.description }}{% endif %}">

<link rel="stylesheet" href="{{ "/css/main.css" | prepend: site.baseurl }}">
<link rel="canonical" href="{{ page.url | replace:'index.html','' | prepend: site.baseurl | prepend: site.url }}">
<link rel="alternate" type="application/rss+xml" title="{{ site.title }}" href="{{ "/feed.xml" | prepend: site.baseurl | prepend: site.url }}">
</head>
27 changes: 27 additions & 0 deletions docs/jstorm-doc/_includes/header.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<header class="site-header">

<div class="wrapper">

<a class="site-title" href="{{ site.baseurl }}/">{{ site.title }}</a>

<nav class="site-nav">
<a href="#" class="menu-icon">
<svg viewBox="0 0 18 15">
<path fill="#424242" d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.031C17.335,0,18,0.665,18,1.484L18,1.484z"/>
<path fill="#424242" d="M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0c0-0.82,0.665-1.484,1.484-1.484 h15.031C17.335,6.031,18,6.696,18,7.516L18,7.516z"/>
<path fill="#424242" d="M18,13.516C18,14.335,17.335,15,16.516,15H1.484C0.665,15,0,14.335,0,13.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.031C17.335,12.031,18,12.696,18,13.516L18,13.516z"/>
</svg>
</a>

<div class="trigger">
{% for my_page in site.pages %}
{% if my_page.title %}
<a class="page-link" href="{{ my_page.url | prepend: site.baseurl }}">{{ my_page.title }}</a>
{% endif %}
{% endfor %}
</div>
</nav>

</div>

</header>
1 change: 1 addition & 0 deletions docs/jstorm-doc/_includes/icon-github.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<a href="https://github.com/{{ include.username }}"><span class="icon icon--github">{% include icon-github.svg %}</span><span class="username">{{ include.username }}</span></a>
1 change: 1 addition & 0 deletions docs/jstorm-doc/_includes/icon-github.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/jstorm-doc/_includes/icon-twitter.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<a href="https://twitter.com/{{ include.username }}"><span class="icon icon--twitter">{% include icon-twitter.svg %}</span><span class="username">{{ include.username }}</span></a>
1 change: 1 addition & 0 deletions docs/jstorm-doc/_includes/icon-twitter.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit cc1256a

Please sign in to comment.