Sbert-ChineseExample

Sentence-Transformers Information Retrieval example on Chinese

About The Project

Sentence Transformers is a multilingual sentence embedding generate framework, which provides an easy method to compute dense vector representations for sentences and paragraphs (based on HuggingFace Transformers)

This repository target at ms_macro like task on a Chinese dataset, train bi_encoder and cross_encoder, with the help of elasticsearch easy interface on pandas to build serlizable conclusion.

Built With

Sentence Transformers
Elasticsearch
Faiss

Getting Started

Installation

pip

pip install -r requirements.txt

install Elasticsearch and start service

Usage

1. Download Data from google drive

2. bi_encoder data prepare

3. train bi_encoder

4. cross_encoder train data prepare

5. cross_encoder valid data prepare

6. train cross_encoder

7. show bi_encoder cross_encoder inference

Roadmap

* 1 This repository use edited es-pandas interface (support vector serlized) to have a simple manipulate on elasticsearch by pandas.
* 2 try_sbert_neg_sampler.py sample hard negative samples drived from class provide by https://guzpenha.github.io/transformer_rankers/ can also use elastic search to generate hard samples , relate functions have defined in valid_cross_encoder_on_bi_encoder.py
* 3 Before training your dataset on cross_encoder, should take a look at the semantic similarity between different questions. Combine some samples with similar semantic may give help.
* 4 Add some toolkit to Sbert to support multi-class-evaluation (as dictionary) ## Contributing

License

Distributed under the MIT License. See LICENSE for more information.

Contact

svjack - svjackbt@gmail.com

Project Link: https://github.com/svjack/Sbert-ChineseExample

Acknowledgements

Sentence Transformers
Elasticsearch
Faiss
Transformer Rankers
es_pandas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.md

README_EN.md

Sbert-ChineseExample

Table of Contents

About The Project

Built With

Getting Started

Installation

Usage

Roadmap

License

Contact

Acknowledgements

Files

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Sbert-ChineseExample

Table of Contents

About The Project

Built With

Getting Started

Installation

Usage

Roadmap

License

Contact

Acknowledgements