Google Summer of Code

This repository serves as a final report summarizing my contributions during Google Summer of Code 2023 with OpenVINO. The project involved exposing the OpenVINO Runtime API in Java using the JNI framework and adding support for OpenVINO Runtime in John Snow Labs Spark NLP, a high-performance NLP library.

Project Abstract

Spark NLP is an open-source, NLP library widely used in production that offers simple, performant and accurate NLP annotations for machine learning pipelines that can scale easily in distributed environments. It provides an enterprise-grade, unified solution that comes with thousands of pretrained models, pipelines and several NLP features that enable users to build end-to-end NLP pipelines and fits seamlessly into your data processing pipeline by extending Apache Spark natively. Written in Scala, Spark NLP provides support for Python, R and the JVM ecosystem (Java, Scala and Kotlin). Currently, it offers CPU optimization capabilities via Intel-optimized Tensorflow and ONNX Runtime, and supports importing custom models in the Tensorflow SavedModel and ONNX formats.

This project aims to enhance the capabilities of Spark NLP by adding support for OpenVINO Runtime, providing significant out-of-the-box improvements for LLM models including BERT, especially on CPU and integrated GPU-based systems, and extending support for various model formats like ONNX, PaddlePaddle, TensorFlow, TensorFlow Lite and OpenVINO IR. Combined with further optimization and quantization capabilities offered by the OpenVINO Toolkit ecosystem when exporting models, OpenVINO Runtime will serve as a unified, high-performance inference engine capable of delivering accelerated inferencing of NLP pipelines on a variety of Intel hardware platforms. Furthermore, exposing the OpenVINO API bindings in Java will open up avenues for a large community of Java developers to benefit from OpenVINO's rich feature set as an inference and deployment solution for JVM-based projects in the future.

Key Deliverables

Add required JNI Bindings to the OpenVINO Java module
Enable the OpenVINO Runtime API to import and run models in Spark NLP
Benchmark models run with the new OpenVINO backend
Sample scripts demonstrating the usage of this feature
Sample notebooks demonstrating how to export and prepare models

Contributions

PR Link	Description
PR #668	Reorganize project structure and improve documentation
PR #709	Add Java API bindings
PR #13947	Integrating OpenVINO Runtime in Spark NLP

Models Covered

Spark NLP	Notebook	Sample
BertEmbeddings	Export BERT HuggingFace	BERT Embeddings Named Entity Recognition with BERT Embeddings
RoBertaEmbeddings	Export RoBerta HuggingFace
XlmRoBertaEmbeddings	Export XLM RoBerta HuggingFace

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
benchmarks		benchmarks
docs		docs
notebooks		notebooks
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Summer of Code

Project Abstract

Key Deliverables

Contributions

Models Covered

Blogs and other resources

About

Releases

Packages

Languages

License

rajatkrishna/google-summer-of-code

Folders and files

Latest commit

History

Repository files navigation

Google Summer of Code

Project Abstract

Key Deliverables

Contributions

Models Covered

Blogs and other resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages