910B2C部署llama2-7b在推理时失败 #26

dengyingxu · 2024-08-15T09:47:40Z

报错如下，请问对910B2C是否适配验证过呢？使用cann包为RC1，RC2，报错都如下所示。

(base) [root@A03-R40-I18-71-0001195 dengyingxu1]# cat /root/atb/log/atb_17738_20240815033442.log
[2024-08-15 03:34:42.482189] [info] [17738] [config.cpp:34] Config:
[2024-08-15 03:34:42.482781] [info] [17738] [config.cpp:190] SocVersion:Ascend910B2C
[2024-08-15 03:34:42.482797] [info] [17738] [config.cpp:304] env:ATB_HOST_TILING_BUFFER_BLOCK_NUM value:128
[2024-08-15 03:34:42.482803] [info] [17738] [config.cpp:304] env:ATB_DEVICE_TILING_BUFFER_BLOCK_NUM value:32
[2024-08-15 03:34:42.482808] [info] [17738] [config.cpp:304] env:ATB_RUNNER_POOL_SIZE value:64
[2024-08-15 03:34:42.482818] [info] [17738] [config.cpp:55] AtbHomePath: /usr/local/Ascend/mindie/latest/mindie-rt/mindie-atb/atb, IsStreamSyncEveryRunnerEnable: 0, IsStreamSyncEveryKernelEnable: 1, IsStreamSyncEveryOperationEnable: 0
[2024-08-15 03:34:42.482825] [info] [17738] [config.cpp:59] IsOpsRunnerSetupCacheEnable: 1, IsCompareTilingByHashValue: 0, KernelCacheType: 3, LocalKernelCacheCount: 1, GlobalKernelCacheCount: 5
[2024-08-15 03:34:42.482830] [info] [17738] [config.cpp:63] IsUsingProfiling: 0, KernelCacheTilingSize: 10240, IsCompareTilingEveryKernelEnable: 0
[2024-08-15 03:34:42.482834] [info] [17738] [config.cpp:65] WorkspaceMemAllocAlgType: 1, IsworkspaceMemAllocGlobal: 0, HostTilingBufferBlockNum:128, DeviceTilingBufferBlockNum:32, ShareMemoryNameSuffix:, IsLaunchKernelWithTiling:1, IsMatmulShuffleKEnable:1, RunnerPoolSize:64
[2024-08-15 03:34:42.482848] [info] [17738] [tiling_buffer_pool.cpp:36] TilingBufferPool malloc buffer, blockNum:128, blockSize:3145728, totalSize:402653184
[2024-08-15 03:34:42.482854] [info] [17738] [host_tiling_buffer_pool.cpp:28] malloc bufferSize:402653184
[2024-08-15 03:34:42.482877] [info] [17738] [tiling_buffer_pool.cpp:36] TilingBufferPool malloc buffer, blockNum:32, blockSize:3145728, totalSize:100663296
[2024-08-15 03:34:42.482881] [info] [17738] [device_tiling_buffer_pool.cpp:29] aclrtMalloc bufferSize:100663296
[2024-08-15 03:34:42.483080] [info] [17738] [context_base.cpp:82] ContextBase init success
[2024-08-15 03:35:04.439191] [info] [17738] [atb_operation_ir_cfg.cpp:41] Load atb_ops_info.ini success!
[2024-08-15 03:36:48.740384] [info] [17817] [operation_base.cpp:486] LinearOperation_0 setup start, variantPack:
inTensors[0]: dtype: float16, format: nd, shape:[9, 4096], dataSize:73728
inTensors[1]: dtype: float16, format: nd, shape:[4096, 12288], dataSize:100663296
outTensors[0]: dtype: float16, format: nd, shape:[50331648], dataSize:100663296

[2024-08-15 03:36:48.740459] [error] [17817] [operation_base.cpp:291] LinearOperation_0 variantPack.inTensors.size: 2is not equal GetInputNum: 3
[2024-08-15 03:36:48.740469] [error] [17817] [operation_base.cpp:493] LinearOperation_0 invalid param, setup check fail, error code: 5

whitelok · 2024-08-15T11:39:48Z

这里大概是什么模型，我这边推理测试的话用的是https://github.com/pcg-mlp/KsanaLLM/blob/main/tests/model/llama_integration_test.py，模型的话可以用开源的llama2 13B替换。

dengyingxu · 2024-08-15T12:18:15Z

模型是Llama-2-7b-chat-hf，测试脚本用的example/llama7b里的serving_client.py

whitelok · 2024-08-16T02:02:09Z

你yaml的config用的是哪一个

whitelok · 2024-08-16T03:10:55Z

我check了一下现在github这的commit，貌似我们老板没有同步内部最新代码出去。这个问题已经解决了。
我刚用https://huggingface.co/TheBloke/Llama-2-7B-Chat-fp16跑了一个结果。

dengyingxu · 2024-08-16T03:19:40Z

我check了一下现在github这的commit，貌似我们老板没有同步内部最新代码出去。这个问题已经解决了。

请问，什么时候可以使用到修复的代码呢？

dengyingxu · 2024-08-16T03:21:00Z

请问，是哪个commit修复的呢？

whitelok · 2024-08-16T03:32:05Z

请问，是哪个commit修复的呢？

稍后会同步到github。

dengyingxu · 2024-08-16T03:34:44Z

请问，可以提供一个修复的分支嘛？十分感谢

whitelok · 2024-08-16T03:35:44Z

请问，可以提供一个修复的分支嘛？十分感谢

有的，这个稍后会同步出来。

dengyingxu · 2024-08-16T03:38:10Z

请问，可以提供一个修复的分支嘛？十分感谢

有的，这个稍后会同步出来。

请问，大概什么时候可以使用呢？我们老板希望能够尽快看到一念的性能。

whitelok · 2024-08-16T03:39:05Z

请问，可以提供一个修复的分支嘛？十分感谢

有的，这个稍后会同步出来。

请问，大概什么时候可以使用呢？我们老板希望能够尽快看到一念的性能。

这个可能要看排期啥的了，暂时还不太清楚。

pcg-mlp · 2024-08-16T03:48:19Z

请问，可以提供一个修复的分支嘛？十分感谢

有的，这个稍后会同步出来。

请问，大概什么时候可以使用呢？我们老板希望能够尽快看到一念的性能。

近期正在和华为协作优化性能，如果是单纯看性能的话，大概两周后，性能数据会更靠谱。

dengyingxu · 2024-08-22T07:30:21Z

请问，可以提供一个修复的分支嘛？十分感谢

有的，这个稍后会同步出来。

请问，大概什么时候可以使用呢？我们老板希望能够尽快看到一念的性能。

近期正在和华为协作优化性能，如果是单纯看性能的话，大概两周后，性能数据会更靠谱。

请问协作优化指的是什么呢？两周后算子会使用哪个呢？一念自研AscendCL算子、ATB算子还是aclnn的算子？

pcg-mlp · 2024-08-22T08:32:24Z

华为的开发人员一起把当前的aclnn算子迁移到atb算子

dengyingxu · 2024-08-22T10:42:19Z

华为的开发人员一起把当前的aclnn算子迁移到atb算子

十分感谢您的回答，请问您这边为什么没有直接使用aclnn的一套算子呢？aclnn也提供了paged Attention、flashAttention（aclnnIncreFlashAttentionV4）之类的封装，请问您这边是测试了ATB方案和aclnn的方案吗？

pcg-mlp · 2024-08-22T11:49:53Z

atb的性能更高

whitelok · 2024-09-20T10:14:45Z

华为的开发人员一起把当前的aclnn算子迁移到atb算子

十分感谢您的回答，请问您这边为什么没有直接使用aclnn的一套算子呢？aclnn也提供了paged Attention、flashAttention（aclnnIncreFlashAttentionV4）之类的封装，请问您这边是测试了ATB方案和aclnn的方案吗？

已经release了，试试吧。

whitelok · 2025-01-21T08:05:41Z

@dengyingxu 这边还有遇到其他问题吗？

dengyingxu changed the title ~~910B2C在部署llama2-7b在推理时失败~~ 910B2C部署llama2-7b在推理时失败 Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

910B2C部署llama2-7b在推理时失败 #26

910B2C部署llama2-7b在推理时失败 #26

dengyingxu commented Aug 15, 2024 •

edited

Loading

whitelok commented Aug 15, 2024

dengyingxu commented Aug 15, 2024

whitelok commented Aug 16, 2024

whitelok commented Aug 16, 2024 •

edited

Loading

dengyingxu commented Aug 16, 2024

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024 •

edited

Loading

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024

pcg-mlp commented Aug 16, 2024

dengyingxu commented Aug 22, 2024

pcg-mlp commented Aug 22, 2024

dengyingxu commented Aug 22, 2024

pcg-mlp commented Aug 22, 2024

whitelok commented Sep 20, 2024

whitelok commented Jan 21, 2025

910B2C部署llama2-7b在推理时失败 #26

910B2C部署llama2-7b在推理时失败 #26

Comments

dengyingxu commented Aug 15, 2024 • edited Loading

whitelok commented Aug 15, 2024

dengyingxu commented Aug 15, 2024

whitelok commented Aug 16, 2024

whitelok commented Aug 16, 2024 • edited Loading

dengyingxu commented Aug 16, 2024

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024 • edited Loading

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024

dengyingxu commented Aug 16, 2024

whitelok commented Aug 16, 2024

pcg-mlp commented Aug 16, 2024

dengyingxu commented Aug 22, 2024

pcg-mlp commented Aug 22, 2024

dengyingxu commented Aug 22, 2024

pcg-mlp commented Aug 22, 2024

whitelok commented Sep 20, 2024

whitelok commented Jan 21, 2025

dengyingxu commented Aug 15, 2024 •

edited

Loading

whitelok commented Aug 16, 2024 •

edited

Loading

whitelok commented Aug 16, 2024 •

edited

Loading