GitHub - MarkLee131/CR_score: CR_score: Semantic-based retriever within PatchFinder (ISSTA 2024)

v# Semantic-based Retriever for Security Patch Tracing

PatchFinder's Semantic-based Retriever is an innovative component designed specifically for linking CVE descriptions with their corresponding patches in code repositories. It stands as a key feature in the PatchFinder tool, enhancing the accuracy and efficiency of security patch tracing in open-source software.

Core Functionality

Semantic Analysis: Utilizes context embedding to understand and match the semantics between CVE descriptions and code changes.
Integration with CodeReviewer: Employs a pretrained CodeReviewer model to deeply analyze code semantics, ensuring precise patch identification.

For an illustration, BERTScore recall can be computed as

Usage

Installation

Python version >= 3.6
PyTorch version >= 1.0.0

There are two ways to use the Semantic-based Retriever:

Install it from this source:

pip install .

and you can use it as a python module.

Using Python Function:

On a high level, we provide a python function cr_score.score and a python object cr_score.CRScorer. The function provides all the supported features while the scorer object caches the BERT model to faciliate multiple evaluations. Check our demo to see how to use these two interfaces. Please refer to cr_score/score.py for implementation details.

Command Line Interface (CLI)

We provide a command line interface (CLI) of BERTScore as well as a python module. For the CLI, you can use it as follows:

To evaluate the semantic similarity between two text files:

cr-score -r example/refs.txt -c example/hyps.txt -m microsoft/codereviewer

See more options by cr-score -h.

To visualize matching scores:

cr-score-show -m microsoft/codereviewer -r "Server exposed to file path traversal" -c "set up strict file path validation" -f out.png

The figure will be saved to out.png.

Practical Tips

Using inverse document frequency (idf) on the reference sentences to weigh word importance may correlate better with human judgment. However, when the set of reference sentences become too small, the idf score would become inaccurate/invalid. We now make it optional. To use idf, please set --idf when using the CLI tool or idf=True when calling cr_score.score function.
When you are low on GPU memory, consider setting batch_size when calling cr_score.score function.
To use a particular model please set -m MODEL_TYPE when using the CLI tool or model_type=MODEL_TYPE when calling cr_score.score function.
We tune layer to use based on WMT16 metric evaluation dataset. You may use a different layer by setting -l LAYER or num_layers=LAYER. To tune the best layer for your custom model, please follow the instructions in tune_layers folder.
Limitation: Because CodeReviewer with learned positional embeddings are pre-trained on sentences with max length 512, Our Semantic-based retriever is undefined between sentences longer than 510 (512 after adding [CLS] and [SEP] tokens). The sentences longer than this will be truncated.

Acknowledgements

This repo wouldn't be possible without the awesome BertScore, CodeReviewer, fairseq, and transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
bert_score		bert_score
bert_score_cli		bert_score_cli
example		example
get_rescale_baseline		get_rescale_baseline
journal		journal
reproduce		reproduce
tests		tests
tune_layers		tune_layers
vector_similarity		vector_similarity
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
bert_score.png		bert_score.png
out.pdf		out.pdf
out.png		out.png
requirements.txt		requirements.txt
semantic_retriever.png		semantic_retriever.png
setup.py		setup.py
test.py		test.py
upload_pypi.sh		upload_pypi.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Core Functionality

Usage

Installation

Install it from this source:

Using Python Function:

Command Line Interface (CLI)

Practical Tips

Acknowledgements

About

Releases

Packages

Languages

License

MarkLee131/CR_score

Folders and files

Latest commit

History

Repository files navigation

Core Functionality

Usage

Installation

Install it from this source:

Using Python Function:

Command Line Interface (CLI)

Practical Tips

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages