Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

跪求解决效率问题 #5

Open
livehl opened this issue Jun 7, 2019 · 1 comment
Open

跪求解决效率问题 #5

livehl opened this issue Jun 7, 2019 · 1 comment

Comments

@livehl
Copy link

livehl commented Jun 7, 2019

无论如何优化,getRange内网速度仅仅只有5M/s,而且还把cpu吃完。没有任何办法可以实现分布式获取全表数据。
建议:
1.优化plainbuffer算法,不要吃掉所有cpu
2.提供单表每隔5000行数据的主键拆分结果,用于多线程快速获取所有数据

以上两个问题不解决,我们生产环境确实很难处理,效率太低

@zhouzf05
Copy link
Contributor

zhouzf05 commented Jun 8, 2019

python sdk的plainbuffer效率确实很低,如果要优化的话可考虑使用的策略是用c extension来实现plainbuffer解析,工作量较大。
如果要获取全量数据,建议先换种方式,几种可选方案:

  1. 换语言,建议选Java或Go
  2. 使用已有数据导出工具,例如DataX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants