New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add InfiniteBenchSum scenario and run spec #3409

Merged

yifanmai merged 5 commits into main from jialiang/long_context

Mar 6, 2025

Contributor

liamjxu commented Mar 6, 2025

No description provided.

liamjxu added 4 commits

March 5, 2025 15:51


          adding scenario

cf5f852


          add run spec

ee59cc7


          formatting

830161d


          satisfying typechecker

9b1dc0c

liamjxu requested a review from yifanmai

March 6, 2025 00:22

liamjxu self-assigned this

yifanmai requested changes

View reviewed changes

src/helm/benchmark/run_specs/experimental_run_specs.py Outdated



		@run_spec_function("infinite_bench_sum")
		def get_infinite_bench_sum_spec(word_lower_bound: float = 0.0, word_upper_bound: float = 100e6) -> RunSpec:

Collaborator

yifanmai Mar 6, 2025

1e8 instead of 100e6 (also isn't 1e7 enough?)

Contributor Author

liamjxu Mar 6, 2025

yes 1e7 is enough; re 1e8 vs 100e6, was meant to make it easier to understand (100e6=100M, 30e3=30k, etc), but am fine with both.
Currently: changed to 1e7 in the latest change

src/helm/benchmark/run_specs/experimental_run_specs.py Outdated

+                  scenario_spec = ScenarioSpec(
+                      class_name="helm.benchmark.scenarios.infinite_bench_sum_scenario.InfiniteBenchSumScenario",
+                      args={
+                          "word_lower_bound": word_lower_bound,

Collaborator

yifanmai Mar 6, 2025

num_words_lower_bound or min_num_words

Contributor Author

liamjxu Mar 6, 2025

Changed to min_num_words

src/helm/benchmark/run_specs/experimental_run_specs.py Outdated

+                      class_name="helm.benchmark.scenarios.infinite_bench_sum_scenario.InfiniteBenchSumScenario",
+                      args={
+                          "word_lower_bound": word_lower_bound,
+                          "word_upper_bound": word_upper_bound,

Collaborator

yifanmai Mar 6, 2025

num_words_upper_bound or max_num_words

Contributor Author

liamjxu Mar 6, 2025

Changed to max_num_words

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated

+              class InfiniteBenchSumScenario(Scenario):
+                  """InfiniteBenchSum
+                  InfiniteBenchbenchmark tailored for evaluating the capabilities of language models to process,

Collaborator

yifanmai Mar 6, 2025

InfiniteBench benchmark

Contributor Author

liamjxu Mar 6, 2025

Corrected

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated



		class InfiniteBenchSumScenario(Scenario):
		"""InfiniteBenchSum

Collaborator

yifanmai Mar 6, 2025

Is there a space between InfiniteBench and Sum?

Contributor Author

liamjxu Mar 6, 2025

Yes, corrected

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated

Comment on lines 55 to 57

+                      dataset = dataset.map(lambda example: {"prompt": example["context"] + "\n\n" + example["input"]})
+                      dataset = dataset.map(lambda example: {"prompt_wc": len(example["prompt"].split())})
+                      dataset = dataset.filter(lambda example: self.word_lower_bound <= example["prompt_wc"] <= self.word_upper_bound)

Collaborator

yifanmai Mar 6, 2025

can you just do this chained

(
dataset.map()
  .map()
  .filter()
)

Contributor Author

liamjxu Mar 6, 2025 •

edited

Loading

Yes, changed to this

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated


		assert isinstance(dataset, Dataset)

		dataset = dataset.map(lambda example: {"prompt": example["context"] + "\n\n" + example["input"]})

Collaborator

yifanmai Mar 6, 2025

Don't need this

Contributor Author

liamjxu Mar 6, 2025

Changed

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated

+                      assert isinstance(dataset, Dataset)
+                      dataset = dataset.map(lambda example: {"prompt": example["context"] + "\n\n" + example["input"]})
+                      dataset = dataset.map(lambda example: {"prompt_wc": len(example["prompt"].split())})

Collaborator

yifanmai Mar 6, 2025

def count_words:
    return len(re.split(r"\s+", text.strip()))

then do

{"prompt_wc": count_words(example["context"]) + count_words(example["input"])}

Contributor Author

liamjxu Mar 6, 2025

Changed

src/helm/benchmark/scenarios/infinite_bench_sum_scenario.py Outdated

+                      )
+                      # Load the dataset with the specified features
+                      dataset = load_dataset("xinrongzhang2022/InfiniteBench", split="longbook_sum_eng", features=ft)

Collaborator

yifanmai Mar 6, 2025

pin revision revision="90f0394333616266d9fe85824ceaf505093cbaa5"

Contributor Author

liamjxu Mar 6, 2025

Revision pinned


          integrate requested change

0aa8b04

liamjxu requested a review from yifanmai

March 6, 2025 03:00

yifanmai approved these changes

View reviewed changes

yifanmai merged commit 185bc24 into main

12 checks passed

yifanmai deleted the jialiang/long_context branch

March 6, 2025 04:14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet