We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
场景是:你的程序负责从kafka中按topic消费,数据来源是logstash采集的各个业务组生成的数据和日志。 要求是你的程序从kafka消费后,到写入存储前数据必须去重,怎么做到呢? 请大家给点思路谢谢
The text was updated successfully, but these errors were encountered:
判断重复的依据是啥?假设有明确的依据,假设你用Java,那么把你的数据和日志,放到一个合理地重写了equals方法的类的实例里,把这些实例加入一个HashSet,就达到去重的目的了,简单粗暴😁 如果是并发环境,可以用相应的并发版本的Set,比如ConcurrentSkipListSet
HashSet
ConcurrentSkipListSet
Sorry, something went wrong.
去重的时间范围是多长?可以在logstash和kafka之间加一层缓存,或者在kafka被消费后执行去重操作
No branches or pull requests
场景是:你的程序负责从kafka中按topic消费,数据来源是logstash采集的各个业务组生成的数据和日志。
要求是你的程序从kafka消费后,到写入存储前数据必须去重,怎么做到呢?
请大家给点思路谢谢
The text was updated successfully, but these errors were encountered: