github-actions commited on
Commit
41db68b
·
1 Parent(s): b037f19

chore(auto): update changelog and version [0.4.0]

Browse files
Files changed (3) hide show
  1. README.md +18 -18
  2. codebleu.py +0 -1
  3. tests.py +4 -16
README.md CHANGED
@@ -5,7 +5,7 @@ tags:
5
  - metric
6
  - code
7
  - codebleu
8
- description: "Unofficial `CodeBLEU` implementation that supports Linux and MacOS."
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
@@ -14,30 +14,33 @@ pinned: false
14
 
15
  # Metric Card for codebleu
16
 
17
- This repository contains an unofficial `CodeBLEU` implementation that supports Linux and MacOS. It is available through `PyPI` and the `evaluate` library.
18
 
19
- The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS, and multiple improvements have been made to enhance usability
20
 
21
- Available for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`.
 
 
22
 
23
  ## Metric Description
24
 
25
  > An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
26
  > We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
27
  > ![CodeBLEU](CodeBLEU.jpg)
28
- (from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) repo)
29
 
30
  In a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
31
 
32
  The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
33
 
 
34
  ## How to Use
35
 
36
  ### Inputs
37
 
38
  - `refarences` (`list[str]` or `list[list[str]]`): reference code
39
  - `predictions` (`list[str]`) predicted code
40
- - `lang` (`str`): code language, see `codebleu.AVAILABLE_LANGS` for available languages (python, c_sharp c, cpp, javascript, java, php at the moment)
41
  - `weights` (`tuple[float,float,float,float]`): weights of the `ngram_match`, `weighted_ngram_match`, `syntax_match`, and `dataflow_match` respectively, defaults to `(0.25, 0.25, 0.25, 0.25)`
42
  - `tokenizer` (`callable`): to split code string to tokens, defaults to `s.split()`
43
 
@@ -71,13 +74,13 @@ reference = "def sum ( first , second ) :\n return second + first"
71
 
72
  result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
73
  print(result)
74
- # {
75
- # 'codebleu': 0.5537,
76
- # 'ngram_match_score': 0.1041,
77
- # 'weighted_ngram_match_score': 0.1109,
78
- # 'syntax_match_score': 1.0,
79
- # 'dataflow_match_score': 1.0
80
- # }
81
  ```
82
 
83
  Or using `evaluate` library (`codebleu` package required):
@@ -98,9 +101,8 @@ Note: `lang` is required;
98
 
99
  [//]: # (*Note any known limitations or biases that the metric has, with links and references if possible.*)
100
 
101
- As this library require `so` file compilation it is platform dependent.
102
-
103
- Currently available for Linux (manylinux) and MacOS on Python 3.8+.
104
 
105
 
106
  ## Citation
@@ -117,6 +119,4 @@ Currently available for Linux (manylinux) and MacOS on Python 3.8+.
117
 
118
  ## Further References
119
 
120
- This implementation is Based on original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) code -- refactored, build for macos, tested and fixed multiple crutches to make it more usable.
121
-
122
  The source code is available at GitHub [k4black/codebleu](https://github.com/k4black/codebleu) repository.
 
5
  - metric
6
  - code
7
  - codebleu
8
+ description: "Unofficial `CodeBLEU` implementation that supports Linux, MacOS and Windows."
9
  sdk: gradio
10
  sdk_version: 3.19.1
11
  app_file: app.py
 
14
 
15
  # Metric Card for codebleu
16
 
17
+ This repository contains an unofficial `CodeBLEU` implementation that supports `Linux`, `MacOS` and `Windows`. It is available through `PyPI` and the `evaluate` library.
18
 
19
+ Available for: `Python`, `C`, `C#`, `C++`, `Java`, `JavaScript`, `PHP`, `Go`, `Ruby`.
20
 
21
+ ---
22
+
23
+ The code is based on the original [CodeXGLUE/CodeBLEU](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) and updated version by [XLCoST/CodeBLEU](https://github.com/reddy-lab-code-research/XLCoST/tree/main/code/translation/evaluator/CodeBLEU). It has been refactored, tested, built for macOS and Windows, and multiple improvements have been made to enhance usability.
24
 
25
  ## Metric Description
26
 
27
  > An ideal evaluation metric should consider the grammatical correctness and the logic correctness.
28
  > We propose weighted n-gram match and syntactic AST match to measure grammatical correctness, and introduce semantic data-flow match to calculate logic correctness.
29
  > ![CodeBLEU](CodeBLEU.jpg)
30
+ [from [CodeXGLUE](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/code-to-code-trans/evaluator/CodeBLEU) repo]
31
 
32
  In a nutshell, `CodeBLEU` is a weighted combination of `n-gram match (BLEU)`, `weighted n-gram match (BLEU-weighted)`, `AST match` and `data-flow match` scores.
33
 
34
  The metric has shown higher correlation with human evaluation than `BLEU` and `accuracy` metrics.
35
 
36
+
37
  ## How to Use
38
 
39
  ### Inputs
40
 
41
  - `refarences` (`list[str]` or `list[list[str]]`): reference code
42
  - `predictions` (`list[str]`) predicted code
43
+ - `lang` (`str`): code language, see `codebleu.AVAILABLE_LANGS` for available languages (python, c_sharp c, cpp, javascript, java, php, go and ruby at the moment)
44
  - `weights` (`tuple[float,float,float,float]`): weights of the `ngram_match`, `weighted_ngram_match`, `syntax_match`, and `dataflow_match` respectively, defaults to `(0.25, 0.25, 0.25, 0.25)`
45
  - `tokenizer` (`callable`): to split code string to tokens, defaults to `s.split()`
46
 
 
74
 
75
  result = calc_codebleu([reference], [prediction], lang="python", weights=(0.25, 0.25, 0.25, 0.25), tokenizer=None)
76
  print(result)
77
+ {
78
+ 'codebleu': 0.5537,
79
+ 'ngram_match_score': 0.1041,
80
+ 'weighted_ngram_match_score': 0.1109,
81
+ 'syntax_match_score': 1.0,
82
+ 'dataflow_match_score': 1.0
83
+ }
84
  ```
85
 
86
  Or using `evaluate` library (`codebleu` package required):
 
101
 
102
  [//]: # (*Note any known limitations or biases that the metric has, with links and references if possible.*)
103
 
104
+ This library requires `so` file compilation with tree-sitter, so it is platform dependent.
105
+ Currently available for `Linux` (manylinux), `MacOS` and `Windows` with Python 3.8+.
 
106
 
107
 
108
  ## Citation
 
119
 
120
  ## Further References
121
 
 
 
122
  The source code is available at GitHub [k4black/codebleu](https://github.com/k4black/codebleu) repository.
codebleu.py CHANGED
@@ -17,7 +17,6 @@ import importlib
17
  import datasets
18
  import evaluate
19
 
20
-
21
  _CITATION = """\
22
  @misc{ren2020codebleu,
23
  title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
 
17
  import datasets
18
  import evaluate
19
 
 
20
  _CITATION = """\
21
  @misc{ren2020codebleu,
22
  title={CodeBLEU: a Method for Automatic Evaluation of Code Synthesis},
tests.py CHANGED
@@ -1,17 +1,5 @@
1
  test_cases = [
2
- {
3
- "predictions": [0, 0],
4
- "references": [1, 1],
5
- "result": {"metric_score": 0}
6
- },
7
- {
8
- "predictions": [1, 1],
9
- "references": [1, 1],
10
- "result": {"metric_score": 1}
11
- },
12
- {
13
- "predictions": [1, 0],
14
- "references": [1, 1],
15
- "result": {"metric_score": 0.5}
16
- }
17
- ]
 
1
  test_cases = [
2
+ {"predictions": [0, 0], "references": [1, 1], "result": {"metric_score": 0}},
3
+ {"predictions": [1, 1], "references": [1, 1], "result": {"metric_score": 1}},
4
+ {"predictions": [1, 0], "references": [1, 1], "result": {"metric_score": 0.5}},
5
+ ]