Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT]: Add support of GGUF as a input format for LLM #1885

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
ef89e2a
Added gguf-tools as submodule
AlexKoff88 Feb 3, 2025
8b86481
Merge remote-tracking branch 'origin/master' into ak/gguf_support
AlexKoff88 Mar 3, 2025
f671a44
Copied some pieces of functionality from 3rd parties. Rewriting is WIP.
AlexKoff88 Mar 5, 2025
4991ca0
Added GGUF reading and conversion to ov::Tensor. Fixed cmake
AlexKoff88 Mar 6, 2025
ecde304
Start adding model creation from GGUF
AlexKoff88 Mar 7, 2025
7a2f1ea
Added RoPE
AlexKoff88 Mar 10, 2025
84e9788
Added MHA
AlexKoff88 Mar 10, 2025
9221be7
Added RMSNorm, MVN, etc.
AlexKoff88 Mar 10, 2025
fbba6f8
Removed submodules
AlexKoff88 Mar 11, 2025
8fe17fa
Removed WWB changes
AlexKoff88 Mar 11, 2025
3cce078
Merge branch 'master' into ak/gguf_support
AlexKoff88 Mar 11, 2025
b91ce6f
Merged with master
AlexKoff88 Mar 11, 2025
64cb5d6
Added implementation of Trasformer block
AlexKoff88 Mar 11, 2025
d891ec8
Added RoPE initialization
AlexKoff88 Mar 11, 2025
6ef3418
Added code for Llama based models
AlexKoff88 Mar 11, 2025
4b0c61e
Finished pipeline for models creation
AlexKoff88 Mar 12, 2025
be72ce8
Reshuffled the code. Extended configs to other data types.
AlexKoff88 Mar 12, 2025
779878f
Changed headers extension to .hpp
AlexKoff88 Mar 13, 2025
24b3475
Fixed model creation issues. Added sample to test GGUF.
AlexKoff88 Mar 13, 2025
3f0f12e
Fixed inference issues. Still have problems with generation output qu…
AlexKoff88 Mar 13, 2025
a9f95e3
Fixes in the causal mask subgraph
AlexKoff88 Mar 14, 2025
6cd777f
Changed rotate_half
AlexKoff88 Mar 14, 2025
9a5ac48
Fixed MHA. Got good results for FP16
AlexKoff88 Mar 14, 2025
4025d00
Fixed Q8_0 models
AlexKoff88 Mar 14, 2025
5a35124
Fixes for Q4_0 and Q4_1 unpacking
AlexKoff88 Mar 14, 2025
8206ad9
Added Q4_0/1 support. Result does not converge.
AlexKoff88 Mar 17, 2025
68dce03
Fixed result convergence for Q4_0 and Q4_1
AlexKoff88 Mar 17, 2025
2890d06
Made model conversion more generic. Qwen results does not converge st…
AlexKoff88 Mar 18, 2025
1538f91
Added bias Add operation after MatMuls where it is applicable. Still …
AlexKoff88 Mar 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmake/features.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
option(ENABLE_PYTHON "Enable Python API build" ON)
option(ENABLE_JS "Enable JS API build" OFF)
option(ENABLE_SAMPLES "Enable samples build" ON)
option(ENABLE_GGUF "Enable support for GGUF format" ON)

# Disable building samples for NPM package
if(CPACK_GENERATOR STREQUAL "NPM")
Expand Down
1 change: 1 addition & 0 deletions gguf-tools
Submodule gguf-tools added at 8fa6eb
1 change: 1 addition & 0 deletions samples/cpp/text_generation/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ endfunction()

set (SAMPLE_LIST
greedy_causal_lm
gguf_example
encrypted_model_causal_lm
beam_search_causal_lm
chat_sample
Expand Down
17 changes: 17 additions & 0 deletions samples/cpp/text_generation/gguf_example.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// Copyright (C) 2023-2025 Intel Corporation
// SPDX-License-Identifier: Apache-2.0

#include "openvino/genai/llm_pipeline.hpp"

#include "gguf_modeling.hpp"

#include "openvino/openvino.hpp"

int main(int argc, char* argv[]) {
std::string models_path = argv[1];
std::string output_path = argv[2];

auto model = create_from_gguf(models_path);

ov::save_model(model, output_path + "/openvino_model.xml", false);
}
29 changes: 28 additions & 1 deletion src/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,35 @@ if(TARGET openvino_tokenizers)
add_dependencies(${TARGET_NAME_OBJ} openvino_tokenizers)
endif()

if(ENABLE_GGUF)
message(STATUS "Downloading gguflib")
FetchContent_Declare(
gguflib
GIT_REPOSITORY https://github.com/antirez/gguf-tools/
GIT_TAG af7d88d808a7608a33723fba067036202910acb3)
FetchContent_MakeAvailable(gguflib)
target_include_directories(${TARGET_NAME_OBJ}
PRIVATE "${gguflib_SOURCE_DIR}")
target_include_directories(${TARGET_NAME_OBJ}
PRIVATE "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils>")
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
add_library(gguflib STATIC ${gguflib_SOURCE_DIR}/fp16.c
${gguflib_SOURCE_DIR}/gguflib.c)
#target_compile_features(gguflib PRIVATE fPIC)
target_link_libraries(${TARGET_NAME_OBJ} PRIVATE $<BUILD_INTERFACE:gguflib>)
target_sources(${TARGET_NAME_OBJ} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf_quants.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/gguf_modeling.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/gguf_utils/building_blocks.cpp)
endif()

target_include_directories(${TARGET_NAME_OBJ}
PUBLIC "$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
"$<BUILD_INTERFACE:${OpenVINOGenAI_SOURCE_DIR}/src/c/include>"
"$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>"
PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/src")

target_include_directories(${TARGET_NAME_OBJ} SYSTEM PRIVATE "${safetensors.h_SOURCE_DIR}")
target_include_directories(${TARGET_NAME_OBJ} SYSTEM PRIVATE "${safetensors.h_SOURCE_DIR}" "${gguflib_SOURCE_DIR}")

target_link_libraries(${TARGET_NAME_OBJ} PRIVATE openvino::runtime openvino::runtime::c openvino::threading nlohmann_json::nlohmann_json jinja2cpp)

Expand All @@ -99,10 +121,15 @@ add_library(openvino::genai ALIAS ${TARGET_NAME})
target_include_directories(${TARGET_NAME} INTERFACE "$<INSTALL_INTERFACE:runtime/include>"
"$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include>"
"$<BUILD_INTERFACE:${OpenVINOGenAI_SOURCE_DIR}/src/c/include>"
"$<BUILD_INTERFACE:${OpenVINOGenAI_SOURCE_DIR}/src/cpp/src/gguf_utils>"
"$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>")

target_link_libraries(${TARGET_NAME} PUBLIC openvino::runtime openvino::runtime::c PRIVATE openvino::threading nlohmann_json::nlohmann_json jinja2cpp ${CMAKE_DL_LIBS})

if(ENABLE_GGUF)
target_link_libraries(${TARGET_NAME} PRIVATE gguflib ${CMAKE_DL_LIBS})
endif()

target_compile_features(${TARGET_NAME} INTERFACE cxx_std_17)

set_target_properties(${TARGET_NAME} PROPERTIES
Expand Down
1,013 changes: 1,013 additions & 0 deletions src/cpp/src/gguf_utils/building_blocks.cpp

Large diffs are not rendered by default.

55 changes: 55 additions & 0 deletions src/cpp/src/gguf_utils/building_blocks.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#pragma once

#include <vector>
#include <stdexcept>
#include <algorithm>
#include <unordered_map>
#include <cstdarg>

#include <openvino/openvino.hpp>

#include "gguf.hpp"

ov::Output<ov::Node> make_lm_head(
const std::string& key,
const ov::Output<ov::Node>& input,
const std::unordered_map<std::string, ov::Tensor>& consts,
const ov::Output<ov::Node>& embeddings_node,
QType qtype);

ov::Output<ov::Node> make_rms_norm(
const std::string& key,
const ov::Output<ov::Node>& input,
const std::unordered_map<std::string, ov::Tensor>& consts,
float epsilon);

std::tuple<ov::Output<ov::Node>, ov::Output<ov::Node>> make_embedding(
const std::string& key,
const ov::Output<ov::Node>& input,
const std::unordered_map<std::string, ov::Tensor>& consts,
QType qtype);

std::tuple<ov::Output<ov::Node>,
ov::SinkVector,
ov::Output<ov::Node>,
std::pair<ov::Output<ov::Node>, ov::Output<ov::Node>>,
std::shared_ptr<ov::Node>>
layer(const std::map<std::string, GGUFMetaData>& configs,
std::unordered_map<std::string, ov::Tensor>& consts,
int layer_idx,
const ov::Output<ov::Node>& hidden_states,
const ov::Output<ov::Node>& attn_mask,
const ov::Output<ov::Node>& causal_mask,
const ov::Output<ov::Node>& position_ids,
const ov::Output<ov::Node>& rope_const,
const ov::Output<ov::Node>& beam_idx,
const ov::Output<ov::Node>& batch_dim,
const ov::Output<ov::Node>& hidden_dim,
const std::pair<ov::Output<ov::Node>, ov::Output<ov::Node>>& cos_sin_cached,
const std::shared_ptr<ov::Node>& output_shape);

ov::Output<ov::Node> init_rope(
int64_t head_dim,
int64_t max_position_embeddings = 2048,
float base = 10000.0f,
float scaling_factor = 1.0f);
Loading