-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
910B2C部署llama2-7b在推理时失败 #26
Comments
这里大概是什么模型,我这边推理测试的话用的是https://github.com/pcg-mlp/KsanaLLM/blob/main/tests/model/llama_integration_test.py,模型的话可以用开源的llama2 13B替换。 |
模型是Llama-2-7b-chat-hf,测试脚本用的example/llama7b里的serving_client.py |
你yaml的config用的是哪一个 |
我check了一下现在github这的commit,貌似我们老板没有同步内部最新代码出去。这个问题已经解决了。 |
请问,什么时候可以使用到修复的代码呢? |
请问,是哪个commit修复的呢? |
稍后会同步到github。 |
请问,可以提供一个修复的分支嘛?十分感谢 |
有的,这个稍后会同步出来。 |
请问,大概什么时候可以使用呢?我们老板希望能够尽快看到一念的性能。 |
这个可能要看排期啥的了,暂时还不太清楚。 |
近期正在和华为协作优化性能,如果是单纯看性能的话,大概两周后,性能数据会更靠谱。 |
请问协作优化指的是什么呢?两周后算子会使用哪个呢?一念自研AscendCL算子、ATB算子还是aclnn的算子? |
华为的开发人员一起把当前的aclnn算子迁移到atb算子 |
十分感谢您的回答,请问您这边为什么没有直接使用aclnn的一套算子呢?aclnn也提供了paged Attention、flashAttention(aclnnIncreFlashAttentionV4)之类的封装,请问您这边是测试了ATB方案和aclnn的方案吗? |
atb的性能更高 |
已经release了,试试吧。 |
@dengyingxu 这边还有遇到其他问题吗? |
报错如下,请问对910B2C是否适配验证过呢?使用cann包为RC1,RC2,报错都如下所示。
(base) [root@A03-R40-I18-71-0001195 dengyingxu1]# cat /root/atb/log/atb_17738_20240815033442.log
[2024-08-15 03:34:42.482189] [info] [17738] [config.cpp:34] Config:
[2024-08-15 03:34:42.482781] [info] [17738] [config.cpp:190] SocVersion:Ascend910B2C
[2024-08-15 03:34:42.482797] [info] [17738] [config.cpp:304] env:ATB_HOST_TILING_BUFFER_BLOCK_NUM value:128
[2024-08-15 03:34:42.482803] [info] [17738] [config.cpp:304] env:ATB_DEVICE_TILING_BUFFER_BLOCK_NUM value:32
[2024-08-15 03:34:42.482808] [info] [17738] [config.cpp:304] env:ATB_RUNNER_POOL_SIZE value:64
[2024-08-15 03:34:42.482818] [info] [17738] [config.cpp:55] AtbHomePath: /usr/local/Ascend/mindie/latest/mindie-rt/mindie-atb/atb, IsStreamSyncEveryRunnerEnable: 0, IsStreamSyncEveryKernelEnable: 1, IsStreamSyncEveryOperationEnable: 0
[2024-08-15 03:34:42.482825] [info] [17738] [config.cpp:59] IsOpsRunnerSetupCacheEnable: 1, IsCompareTilingByHashValue: 0, KernelCacheType: 3, LocalKernelCacheCount: 1, GlobalKernelCacheCount: 5
[2024-08-15 03:34:42.482830] [info] [17738] [config.cpp:63] IsUsingProfiling: 0, KernelCacheTilingSize: 10240, IsCompareTilingEveryKernelEnable: 0
[2024-08-15 03:34:42.482834] [info] [17738] [config.cpp:65] WorkspaceMemAllocAlgType: 1, IsworkspaceMemAllocGlobal: 0, HostTilingBufferBlockNum:128, DeviceTilingBufferBlockNum:32, ShareMemoryNameSuffix:, IsLaunchKernelWithTiling:1, IsMatmulShuffleKEnable:1, RunnerPoolSize:64
[2024-08-15 03:34:42.482848] [info] [17738] [tiling_buffer_pool.cpp:36] TilingBufferPool malloc buffer, blockNum:128, blockSize:3145728, totalSize:402653184
[2024-08-15 03:34:42.482854] [info] [17738] [host_tiling_buffer_pool.cpp:28] malloc bufferSize:402653184
[2024-08-15 03:34:42.482877] [info] [17738] [tiling_buffer_pool.cpp:36] TilingBufferPool malloc buffer, blockNum:32, blockSize:3145728, totalSize:100663296
[2024-08-15 03:34:42.482881] [info] [17738] [device_tiling_buffer_pool.cpp:29] aclrtMalloc bufferSize:100663296
[2024-08-15 03:34:42.483080] [info] [17738] [context_base.cpp:82] ContextBase init success
[2024-08-15 03:35:04.439191] [info] [17738] [atb_operation_ir_cfg.cpp:41] Load atb_ops_info.ini success!
[2024-08-15 03:36:48.740384] [info] [17817] [operation_base.cpp:486] LinearOperation_0 setup start, variantPack:
inTensors[0]: dtype: float16, format: nd, shape:[9, 4096], dataSize:73728
inTensors[1]: dtype: float16, format: nd, shape:[4096, 12288], dataSize:100663296
outTensors[0]: dtype: float16, format: nd, shape:[50331648], dataSize:100663296
[2024-08-15 03:36:48.740459] [error] [17817] [operation_base.cpp:291] LinearOperation_0 variantPack.inTensors.size: 2is not equal GetInputNum: 3
[2024-08-15 03:36:48.740469] [error] [17817] [operation_base.cpp:493] LinearOperation_0 invalid param, setup check fail, error code: 5
The text was updated successfully, but these errors were encountered: