Skip to content

rajatkrishna/google-summer-of-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Summer of Code

This repository serves as a final report summarizing my contributions during Google Summer of Code 2023 with OpenVINO. The project involved exposing the OpenVINO Runtime API in Java using the JNI framework and adding support for OpenVINO Runtime in John Snow Labs Spark NLP, a high-performance NLP library.

Project Abstract

Spark NLP is an open-source, NLP library widely used in production that offers simple, performant and accurate NLP annotations for machine learning pipelines that can scale easily in distributed environments. It provides an enterprise-grade, unified solution that comes with thousands of pretrained models, pipelines and several NLP features that enable users to build end-to-end NLP pipelines and fits seamlessly into your data processing pipeline by extending Apache Spark natively. Written in Scala, Spark NLP provides support for Python, R and the JVM ecosystem (Java, Scala and Kotlin). Currently, it offers CPU optimization capabilities via Intel-optimized Tensorflow and ONNX Runtime, and supports importing custom models in the Tensorflow SavedModel and ONNX formats.

This project aims to enhance the capabilities of Spark NLP by adding support for OpenVINO Runtime, providing significant out-of-the-box improvements for LLM models including BERT, especially on CPU and integrated GPU-based systems, and extending support for various model formats like ONNX, PaddlePaddle, TensorFlow, TensorFlow Lite and OpenVINO IR. Combined with further optimization and quantization capabilities offered by the OpenVINO Toolkit ecosystem when exporting models, OpenVINO Runtime will serve as a unified, high-performance inference engine capable of delivering accelerated inferencing of NLP pipelines on a variety of Intel hardware platforms. Furthermore, exposing the OpenVINO API bindings in Java will open up avenues for a large community of Java developers to benefit from OpenVINO's rich feature set as an inference and deployment solution for JVM-based projects in the future.

Key Deliverables

  • Add required JNI Bindings to the OpenVINO Java module
  • Enable the OpenVINO Runtime API to import and run models in Spark NLP
  • Benchmark models run with the new OpenVINO backend
  • Sample scripts demonstrating the usage of this feature
  • Sample notebooks demonstrating how to export and prepare models

Contributions

PR Link Description
PR #668 Reorganize project structure and improve documentation
PR #709 Add Java API bindings
PR #13947 Integrating OpenVINO Runtime in Spark NLP

Models Covered

Spark NLP Notebook Sample
BertEmbeddings Export BERT HuggingFace
RoBertaEmbeddings Export RoBerta HuggingFace
XlmRoBertaEmbeddings Export XLM RoBerta HuggingFace

Blogs and other resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published