Skip to content

Commit 9233897

Browse files
committed
[GPU] GQA optimization, refactored
1 parent 9fa105e commit 9233897

File tree

5 files changed

+250
-82
lines changed

5 files changed

+250
-82
lines changed

src/plugins/intel_gpu/src/graph/impls/ocl/paged_attention.cpp

+4
Original file line numberDiff line numberDiff line change
@@ -1009,6 +1009,10 @@ struct paged_attention_impl : multi_stage_primitive<paged_attention> {
10091009
impl->use_micro_sdpa = true;
10101010
}
10111011

1012+
std::cout << "use_micro=" << impl->use_micro_sdpa << " KV-cache layouts=["
1013+
<< impl_param.get_input_layout(3).to_short_string() << ", "
1014+
<< impl_param.get_input_layout(4).to_short_string() << "]\n";
1015+
10121016
return impl;
10131017
}
10141018

0 commit comments

Comments
 (0)