Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of GGUF as a input format for LLM #1885

Open
wants to merge 52 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
ef89e2a
Added gguf-tools as submodule
AlexKoff88 Feb 3, 2025
8b86481
Merge remote-tracking branch 'origin/master' into ak/gguf_support
AlexKoff88 Mar 3, 2025
f671a44
Copied some pieces of functionality from 3rd parties. Rewriting is WIP.
AlexKoff88 Mar 5, 2025
4991ca0
Added GGUF reading and conversion to ov::Tensor. Fixed cmake
AlexKoff88 Mar 6, 2025
ecde304
Start adding model creation from GGUF
AlexKoff88 Mar 7, 2025
7a2f1ea
Added RoPE
AlexKoff88 Mar 10, 2025
84e9788
Added MHA
AlexKoff88 Mar 10, 2025
9221be7
Added RMSNorm, MVN, etc.
AlexKoff88 Mar 10, 2025
fbba6f8
Removed submodules
AlexKoff88 Mar 11, 2025
8fe17fa
Removed WWB changes
AlexKoff88 Mar 11, 2025
3cce078
Merge branch 'master' into ak/gguf_support
AlexKoff88 Mar 11, 2025
b91ce6f
Merged with master
AlexKoff88 Mar 11, 2025
64cb5d6
Added implementation of Trasformer block
AlexKoff88 Mar 11, 2025
d891ec8
Added RoPE initialization
AlexKoff88 Mar 11, 2025
6ef3418
Added code for Llama based models
AlexKoff88 Mar 11, 2025
4b0c61e
Finished pipeline for models creation
AlexKoff88 Mar 12, 2025
be72ce8
Reshuffled the code. Extended configs to other data types.
AlexKoff88 Mar 12, 2025
779878f
Changed headers extension to .hpp
AlexKoff88 Mar 13, 2025
24b3475
Fixed model creation issues. Added sample to test GGUF.
AlexKoff88 Mar 13, 2025
3f0f12e
Fixed inference issues. Still have problems with generation output qu…
AlexKoff88 Mar 13, 2025
a9f95e3
Fixes in the causal mask subgraph
AlexKoff88 Mar 14, 2025
6cd777f
Changed rotate_half
AlexKoff88 Mar 14, 2025
9a5ac48
Fixed MHA. Got good results for FP16
AlexKoff88 Mar 14, 2025
4025d00
Fixed Q8_0 models
AlexKoff88 Mar 14, 2025
5a35124
Fixes for Q4_0 and Q4_1 unpacking
AlexKoff88 Mar 14, 2025
8206ad9
Added Q4_0/1 support. Result does not converge.
AlexKoff88 Mar 17, 2025
68dce03
Fixed result convergence for Q4_0 and Q4_1
AlexKoff88 Mar 17, 2025
2890d06
Made model conversion more generic. Qwen results does not converge st…
AlexKoff88 Mar 18, 2025
1538f91
Added bias Add operation after MatMuls where it is applicable. Still …
AlexKoff88 Mar 19, 2025
02b2120
Merged with master
AlexKoff88 Mar 25, 2025
89a6554
Add GGUF reading into Stateful LLMPipeline
AlexKoff88 Mar 25, 2025
c0d7fb1
Added test for GGUF reader
AlexKoff88 Mar 26, 2025
71c9d2f
Merge remote-tracking branch 'origin/master' into ak/gguf_support
AlexKoff88 Mar 26, 2025
9521fb1
Removed submodule
AlexKoff88 Mar 26, 2025
279bcd8
Updated gguf-lib download process
AlexKoff88 Mar 26, 2025
d6a0dc3
Update src/cpp/CMakeLists.txt
AlexKoff88 Mar 26, 2025
c2995c4
Update src/cpp/CMakeLists.txt
AlexKoff88 Mar 26, 2025
5535e49
Update src/cpp/src/utils.cpp
AlexKoff88 Mar 26, 2025
d2e67ab
Fixed issue with FP16 KV-cache hint for quantized models
AlexKoff88 Mar 26, 2025
e2503a8
Tried to fix error
AlexKoff88 Mar 26, 2025
700faad
Tried to fix error2
AlexKoff88 Mar 26, 2025
e2809aa
GGUF sample compilation fix
AlexKoff88 Mar 26, 2025
ed0e577
GGUF sample compilation fix2
AlexKoff88 Mar 26, 2025
707f26e
GGUF sample compilation fix3
AlexKoff88 Mar 26, 2025
2d119f1
Merge remote-tracking branch 'origin/master' into ak/gguf_support
AlexKoff88 Mar 26, 2025
ba57473
Removed unused function
AlexKoff88 Mar 26, 2025
e686064
Try to fix issue
AlexKoff88 Mar 26, 2025
43a6b16
Try to fix issue2
AlexKoff88 Mar 27, 2025
765e00f
Fixed build issues on MAC
AlexKoff88 Mar 27, 2025
f8cac09
Merge remote-tracking branch 'origin/master' into ak/gguf_support
AlexKoff88 Mar 27, 2025
bcb827a
Fixed build issues on Windows
AlexKoff88 Mar 27, 2025
f5ce8a4
Windows build
AlexKoff88 Mar 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmake/features.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
option(ENABLE_PYTHON "Enable Python API build" ON)
option(ENABLE_JS "Enable JS API build" OFF)
option(ENABLE_SAMPLES "Enable samples build" ON)
option(ENABLE_GGUF "Enable support for GGUF format" ON)

# Disable building samples for NPM package
if(CPACK_GENERATOR STREQUAL "NPM")
Expand Down
2 changes: 2 additions & 0 deletions samples/cpp/text_generation/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ endfunction()

set (SAMPLE_LIST
greedy_causal_lm
gguf_example
encrypted_model_causal_lm
beam_search_causal_lm
chat_sample
Expand All @@ -34,6 +35,7 @@ foreach(sample IN LISTS SAMPLE_LIST)
add_sample_executable(${sample})
endforeach()

target_include_directories(gguf_example INTERFACE "$<BUILD_INTERFACE:${OpenVINOGenAI_SOURCE_DIR}/src/cpp/src/gguf_utils>")

# benchmark_genai
include(FetchContent)
Expand Down
17 changes: 17 additions & 0 deletions samples/cpp/text_generation/gguf_example.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, remove all changes in samples folder.

Copy link
Contributor Author

@AlexKoff88 AlexKoff88 Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to remove this sample?

Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#include "openvino/genai/llm_pipeline.hpp"

#include "gguf_modeling.hpp"

#include "openvino/openvino.hpp"

int main(int argc, char* argv[]) {
std::string models_path = argv[1];
std::string output_path = argv[2];

auto model = create_from_gguf(models_path);

ov::save_model(model, output_path + "/openvino_model.xml", false);
}
29 changes: 28 additions & 1 deletion src/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,29 @@ target_include_directories(${TARGET_NAME_OBJ}
"$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>"
PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/src")

target_include_directories(${TARGET_NAME_OBJ} SYSTEM PRIVATE "${safetensors.h_SOURCE_DIR}")
if(ENABLE_GGUF)
message(STATUS "Downloading gguflib")
FetchContent_Declare(
gguflib
URL https://github.com/antirez/gguf-tools/archive/af7d88d808a7608a33723fba067036202910acb3.zip
URL_HASH SHA256=d613559c7a398eb4a0919982e6a370055f8466497f0f866d331dc92b735927e7)
FetchContent_MakeAvailable(gguflib)
target_include_directories(${TARGET_NAME_OBJ}
PRIVATE "${gguflib_SOURCE_DIR}")
target_include_directories(${TARGET_NAME_OBJ}
PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils")
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
add_library(gguflib STATIC ${gguflib_SOURCE_DIR}/fp16.c
${gguflib_SOURCE_DIR}/gguflib.c)
#target_compile_features(gguflib PRIVATE fPIC)
target_link_libraries(${TARGET_NAME_OBJ} PRIVATE gguflib)
target_sources(${TARGET_NAME_OBJ} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf_quants.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf_modeling.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/building_blocks.cpp)
endif()

target_include_directories(${TARGET_NAME_OBJ} SYSTEM PRIVATE "${safetensors.h_SOURCE_DIR}" "${gguflib_SOURCE_DIR}")

target_link_libraries(${TARGET_NAME_OBJ} PRIVATE openvino::runtime openvino::threading nlohmann_json::nlohmann_json jinja2cpp)

Expand All @@ -93,10 +115,15 @@ add_library(openvino::genai ALIAS ${TARGET_NAME})

target_include_directories(${TARGET_NAME} INTERFACE "$<INSTALL_INTERFACE:runtime/include>"
"$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
"$<BUILD_INTERFACE:${OpenVINOGenAI_SOURCE_DIR}/src/cpp/src/gguf_utils>"
"$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>")

target_link_libraries(${TARGET_NAME} PUBLIC openvino::runtime PRIVATE openvino::threading nlohmann_json::nlohmann_json jinja2cpp ${CMAKE_DL_LIBS})

if(ENABLE_GGUF)
target_link_libraries(${TARGET_NAME} PRIVATE gguflib ${CMAKE_DL_LIBS})
endif()

target_compile_features(${TARGET_NAME} INTERFACE cxx_std_17)

if(TARGET openvino_tokenizers)
Expand Down
11 changes: 1 addition & 10 deletions src/cpp/src/continuous_batching_pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,17 +58,8 @@ ContinuousBatchingPipeline::ContinuousBatchingPipeline( const std::filesystem::p

std::filesystem::path model_path = models_path;
std::filesystem::path directory = models_path;
if (std::filesystem::exists(model_path / "openvino_model.xml")) {
model_path = model_path / "openvino_model.xml";
}
else if (std::filesystem::exists(model_path / "openvino_language_model.xml")) {
model_path = model_path / "openvino_language_model.xml";
}
else {
OPENVINO_THROW("Could not find a model in the directory.");
}

auto model = utils::singleton_core().read_model(model_path, {}, properties);
auto model = utils::read_model(model_path, properties);
auto tokenizer = ov::genai::Tokenizer(directory, tokenizer_properties);
auto generation_config = utils::from_config_json_if_exists(directory);

Expand Down
Loading
Loading