Model Card for EXAONE-3.5-7.8B-Stratos-Ko

Base Model

beomi/EXAONE-3.5-7.8B-Instruct-Llamafied, Llamafied version of LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

Dataset

werty1248/Bespoke-Stratos-17k-KoEnKo-formatted
From bespokelabs/Bespoke-Stratos-17k, fixes system prompt, only questions and answers translated, thinking part not translated.

Training

4xA100, 12 hours - Total 48 GPU Hours with A100 PCie
Trained with Axolotl, not LLaMA-Factory, but trying to follow SkyThought's training config as much as possible.
Difference
- sample packing: True
- optimizer: paged_adamw_32bit

Axolotl config

base_model: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
model_type: AutoModelForCausalLM
tokenizer_config: beomi/EXAONE-3.5-7.8B-Instruct-Llamafied
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: werty1248/Bespoke-Stratos-17k-KoEnKo-formatted
    field_messages: conversations
    type: chat_template
    chat_template: chatml

dataset_prepared_path: ./data_preparation
output_dir: /workspace/data

hf_use_auth_token: true

sequence_len: 16384
sample_packing: true
pad_to_sequence_len: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_layer_norm: true
liger_glu_activation: true
liger_fused_linear_cross_entropy: true

wandb_project:
#wandb_entity:
#wandb_watch:
wandb_name:
#wandb_log_model:

gradient_accumulation_steps: 24
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 1.0e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: 
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_ratio: 0.1
eval_table_size:

deepspeed: ./deepspeed_configs/zero3_bf16.json

Evaluation

HAERAE-HUB/HRM8K

werty1248/EXAONE-3.5-7.8B-Stratos-Ko: temperature=0.0, max think tokens = 4096
- In the first stage, the model generates up to 4096 tokens.
- In the second stage, for answers that fail to generate an <|end_of_thought|> after using the maximum number of tokens, I add \n\n<|end_of_thought|>\n\n<|begin_of_solution|> to end the thought and output the answer.
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct: temperature=0.7, top_p=0.95, max_tokens = 2048
Other Models: Reported by OneLineAI - Creator of HRM8K Benchmark

Model	GSM8K	KSM	MATH	MMMLU	OMNI_MATH	Average
GPT-4o	91.21	22.83	74.45	68.72	30.75	57.99
GPT-4o-mini	87.57	19.40	70.68	63.40	26.45	53.50
*EXAONE-3.5-7.8B-Stratos-Ko	83.02	15.97	67.49	**44.68	24.62	49.98
EXAONE-3.5-7.8B-Instruct	81.58	14.71	63.50	***41.49?	21.69	44.19
Qwen2.5-14B-Instruct	66.34	15.55	53.38	61.49	20.64	43.88
Llama-3.1-8B-Instruct	77.79	7.21	49.01	47.02	15.92	39.39
Qwen2.5-7B-Instruct	58.38	13.10	48.04	48.94	16.55	37.80
EXAONE-3.0-7.8B-Instruct	72.33	7.98	46.79	37.66	15.35	36.02
*Ko-R1-1.5B-preview	43.3	?	73.1	?	29.8	?

* Korean Reasoning Models

** In a 4-option multiple-choice question, both correct answer numbber and answer alphabet are accepted as correct ('A' == 1, 'B' == 2, ...).

*** In a 4-option multiple-choice question, the correct answer *(not answer number) are accepted as correct.

Example

Q1: From ChuGyouk/GSM8k-Ko

input = tokenizer.apply_chat_template([ { "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\\n\\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution written in the same language as the question.} <|end_of_solution|> Now, try to solve the following question through the above guidelines", "role": "system" },
                                       { "content": "Janet의 오리는 하루에 16개의 달걀을 낳습니다. 그녀는 매일 아침 3개의 달걀을 먹고, 친구들을 위해 4개의 달걀로 머핀을 굽습니다. 그녀는 남은 달걀을 농산물 시장에서 신선한 오리알 개당 2달러에 판매합니다. 그녀는 농산물 시장에서 매일 얼마나 많은 돈을 벌까요?", "role": "user" },], add_generation_prompt=True, return_tensors='pt')
res = model.generate(input.to('cuda'), max_length = 2048, stop_strings = ['<|end_of_solution|>'], tokenizer=tokenizer)

tokenizer.decode(res[0])

Output

[|assistant|]Okay, let's figure out how much money Janet makes each day from selling her eggs at the market. Let's break it down step by step.

First, Janet's ducks lay 16 eggs every day. So she has 16 eggs each day.

Next, she uses some of these eggs. She eats 3 eggs every morning. So after eating those 3, she has 16 minus 3 eggs left. Let's calculate that: 16 - 3 = 13 eggs left.

Then, she bakes muffins for her friends using 4 eggs. So after making the muffins, she has 13 - 4 eggs left. Let me do that subtraction: 13 - 4 = 9 eggs remaining.

These remaining eggs are the ones she sells at the market. So she has 9 eggs left each day.

Each egg is sold for $2 at the market. Therefore, the amount of money she makes is the number of eggs sold multiplied by the price per egg. So 9 eggs times $2 each equals $18.

Let me double-check to make sure I didn't make a mistake:

16 eggs per day

Eats 3 eggs: 16 - 3 = 13 eggs left

Bakes 4 eggs: 13 - 4 = 9 eggs

9 eggs × $2 each = $18

Yes, that adds up. So Janet sells the eggs at the market and makes $18 each day. That seems straightforward.

Wait, let me think again: she lays 16 eggs, uses 3, bakes 4, leaves 9, so sells 9. 9 times 2 is 18. Yep, that's correct.

Another way to check:

Total eggs each day: 16

Used eggs (eaten + baked): 3 + 4 = 7

So remaining eggs: 16 - 7 = 9

9 eggs × $2 = $18. Yep, same answer.

I don't see any other expenses or other factors here, so the answer should be $18 each day.

**Final Answer**
Janet makes $18 each day at the market.

<|end_of_thought|>

<|begin_of_solution|>

Janet은 매일 16개의 오리알을낍니다. 그녀는 매일 아침 3개의 알을 먹고, 4개의 알로 머핀을 굽습니다. 그녀가 남은 오리알은 농산물 시장에서 개당 2달러에 판매됩니다.

1. **매일 낳는 알**: \( 16 \)개
2. **매일 먹는 알**: \( 3 \)개
3. **매일 사용하는 알**: \( 3 + 4 = 7 \)개
4. **남은 알**: \( 16 - 7 = 9 \)개

남은 9개의 알에 대한 판매 수익은 다음과 같습니다.
\[
9 \text{ 알} \times 2 \text{ 달러/알} = 18 \text{ 달러}
\]

그러므로, 그녀는 농산물 시장에서 매일 \(\boxed{18}\)달러를 벌고 있습니다.

<|end_of_solution|>

Q2: From KSM, HAERAE-HUB/HRM8K

input = tokenizer.apply_chat_template([ { "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\\n\\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution written in the same language as the question.} <|end_of_solution|> Now, try to solve the following question through the above guidelines", "role": "system" },
                                       { "content": "양의 실수  $t$ 에 대하여 함수  $f$ 를 \\[ f(t) = \\int_{0}^{\\sqrt{t}} \\int_{y}^{\\sqrt{t}} \\frac{1}{2 + \\sin(x^2)} \\, dx \\, dy \\] 로 정의할 때,  $f'\\left(\\frac{\\pi}{2}\\right)$ 의 값을 구하시오.", "role": "user" },], add_generation_prompt=True, return_tensors='pt')
res = model.generate(input.to('cuda'), max_length = 8192, stop_strings = ['<|end_of_solution|>'], tokenizer=tokenizer)

tokenizer.decode(res[0])

Output(한국어로 생각함???)

[|assistant|]

먼저 주어진 함수 \( f(t) \)를 이해하기 위해 이중 적분을 분석해야 합니다. 함수는 두 개의 중첩된 적분으로 정의됩니다:

\[
f(t) = \int_{0}^{\sqrt{t}} \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \, dy
\]

이 표현식은 \( x \)에 대한 내부 적분 \( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \)의 결과인 \( y \)에 대한 외부 적분 \( \int_{0}^{\sqrt{t}} dy \)를 나타냅니다. 이는 \( y \)가 \( 0 \)에서 \( \sqrt{t} \) 사이의 임의의 값을 취하며, \( x \)는 \( y \)부터 \( \sqrt{t} \)까지의 값을 취하는 영역 \( R \)을 나타냅니다. 영역 \( R \)의 범위와 관계를 이해하면 주어진 적분의 의미를 더 잘 파악할 수 있습니다.

\( y \)가 0부터 \( \sqrt{t} \)까지, 그리고 \( x \)가 \( y \)부터 \( \sqrt{t} \)까지 변할 때, 영역 \( R \)은 직사각형의 일부가 될 수 있습니다. \( y \)가 0인 경우, \( x \)는 0에서 \( \sqrt{t} \)까지 증가합니다. \( y \)가 \( \sqrt{t} \)인 경우, \( x \)는 \( \sqrt{t} \)에서 다시 \( \sqrt{t} \)로 돌아옵니다. \( y \)가 \( 0 \)과 \( \sqrt{t} \) 사이의 어떤 값 \( y_0 \)인 경우, \( x \)는 \( y_0 \)에서 \( \sqrt{t} \)까지 증가합니다. 따라서, 영역 \( R \)은 다음과 같은 네 개의 사각형으로 구성될 수 있습니다:
- \( y = 0 \)부터 \( y = a \)까지 (단, \( a \leq \sqrt{t} \))의 사각형: 이때 \( x \)는 \( 0 \)과 \( \sqrt{t} \) 사이.
- \( y = a \)부터 \( y = \sqrt{t} \)까지 (단, \( a \leq y \leq \sqrt{t} \))의 사각형: 이때 \( x \)는 \( a \)부터 \( \sqrt{t} \) 사이.

이 사각형들은 \( y = a \)에서 접합니다. 따라서, \( y \)의 범위는 \( 0 \)에서 \( \sqrt{t} \) 사이에 있고, \( a \)는 이전에 정의된 특정 값, 또는 \( t \)에 의해 결정되는 값입니다.

문제는 \( f'(t) \)를 결정하는 것입니다. 이는 \( t \)에 대한 \( f(t) \)의 도함수로, Leibniz 적분 규칙을 사용하여 구할 수 있습니다. 이중 적분에 대한 Leibniz 적분 규칙은 다음과 같습니다:

\[
\frac{d}{dt} \int_{a(t)}^{b(t)} \int_{c(t)}^{d(t)} f(x, y) \, dx \, dy = \int_{a(t)}^{b(t)} \frac{\partial}{\partial t} \left[ \int_{c(t)}^{d(t)} f(x, y) \, dx \right] \, dy + \int_{c(t)}^{d(t)} \frac{\partial}{\partial t} \left[ \int_{a(t)}^{d(t)} f(x, y) \, dx \right] \, dy
\]

주어진 함수 \( f(t) \)는 다음과 같이 쓸 수 있습니다:

\[
f(t) = \int_{0}^{\sqrt{t}} \left( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \right) dy
\]

따라서, Leibniz 규칙을 적용하면, \( f'(t) \)는 다음 두 개의 적분을 포함합니다:

1. \( y \)에 대한 외부 적분: \( \int_{0}^{\sqrt{t}} \frac{\partial}{\partial t} \left( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \right) dy \)
2. \( x \)에 대한 내부 적분: \( \int_{0}^{\sqrt{t}} \frac{\partial}{\partial t} \left( \int_{0}^{x} \frac{1}{2 + \sin(y^2)} \, dy \right) dx \)

이 두 개를 모두 계산해야 합니다.

먼저, \( y \)에 대한 외부 적분을 단순화합니다. \( \frac{\partial}{\partial t} \left( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \right) \)를 계산합니다. 이는 Leibniz 규칙을 사용한 내부 적분에 대한 \( t \)에 대한 도함수로 쓸 수 있습니다:

\[
\frac{\partial}{\partial t} \left( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \right) = \frac{1}{2 + \sin((\sqrt{t})^2)} \cdot \frac{d}{dt}(\sqrt{t}) = \frac{1}{2 + \sin(t)} \cdot \frac{1}{2\sqrt{t}}
\]

따라서, 첫 번째 부분 적분은 다음과 같습니다:

\[
\int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(t)} \cdot \frac{1}{2\sqrt{t}} dy = \frac{1}{2\sqrt{t}(2 + \sin(t))} \int_{0}^{\sqrt{t}} dy = \frac{1}{2\sqrt{t}(2 + \sin(t))} \cdot \sqrt{t} = \frac{1}{2(2 + \sin(t))}
\]

다음으로, \( x \)에 대한 내부 적분을 단순화합니다. \( \frac{\partial}{\partial t} \left( \int_{0}^{x} \frac{1}{2 + \sin(y^2)} \, dy \right) \)를 계산합니다. 이는 \( y = x \)에서 \( t \)에 대한 도함수로 쓸 수 있습니다:

\[
\frac{\partial}{\partial t} \left( \int_{0}^{x} \frac{1}{2 + \sin(y^2)} \, dy \right) = \frac{\partial}{\partial t} \left( \int_{0}^{x} f(y) \, dy \right) = f(x) = \frac{1}{2 + \sin(x^2)}
\]

따라서, 두 번째 부분 적분은 다음과 같습니다:

\[
\int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx
\]

이것을 결합하면, \( f'(t) \)의 전체 식은 다음과 같습니다:

\[
f'(t) = \frac{1}{2(2 + \sin(t))} + \int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx
\]

\( f'(t) \)의 각 부분을 단순화하면 다음과 같습니다:

1. 첫 번째 항 \( \frac{1}{2(2 + \sin(t))} = \frac{1}{4 + 2\sin(t)} \)
2. 두 번째 항은 적분되었지만, 주어진 형태로 단순화할 수 없습니다 (더 복잡함). 그러나 \( t_{\text{max}} \)를 구할 때 문제가 필요로 합니다.

따라서, \( f'(t) \)는 다음과 같습니다:

\[
f'(t) = \frac{1}{4 + 2\sin(t)} + \int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx
\]

\( t = \frac{\pi}{2} \)인 경우, 우리는 \( f'\left( \frac{\pi}{2} \right) \)를 필요로 합니다.

먼저, 첫 번째 항을 평가합니다:

\[
\frac{1}{4 + 2\sin\left( \frac{\pi}{2} \right)} = \frac{1}{4 + 2 \cdot 1} = \frac{1}{6}
\]

두 번째 항을 평가하려면, \( \sin\left( x^2 \right) \)에 대한 \( x \)에 대한 적분을 \( \frac{\pi}{2} \)까지 \( \sqrt{t} \)까지 고려해야 합니다. 하지만 \( t = \frac{\pi}{2} \)이므로, 이 적분은 \( x \)가 0에서 \( \sqrt{\frac{\pi}{2}} \)까지의 범위를 갖게 됩니다. 그러나 우리가 알고 있는 점은 \( \sin(x^2) \)의 적분에 대한 닫힌 형태가 존재하지 않으며, 따라서 적분은 직접적으로 평가할 수 없다는 것입니다. 이 적분은 주어진 범위에서 \(\sin(x^2)\)의 주기적이고 복잡한 행동 때문에 평가할 수 없습니다.

그러나 문제의 맥락에서 주어진 값을 바탕으로 가능한 간단화 또는 특정 값의 활용을 고려할 수 있습니다. 주어진 선택지와 계산 결과를 통해, \( f'\left( \frac{\pi}{2} \right) \)의 값은 첫 번째 항 \( \frac{1}{6} \)과 두 번째 항 \(\int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx\)의 합으로부터 추정됩니다. 적분의 성질과 \( x = 0 \)에서 \( x = \sqrt{\frac{\pi}{2}} \)까지 \( 2 + \sin(x^2) \)에 대한 대칭성을 고려할 때, 첫 번째 항과 두 번째 항 모두 \( \frac{1}{6} \)에 가까울 수도 있지만, 실제로 \( t = \frac{\pi}{2} \)를 사용하는 문제의 맥락에서 이 적분의 값은 간단화하기 어렵습니다.

그러나, 문제의 맥락과 주어진 답지 중 가장 가능성 있는 값을 선택하기 위해, \( f'\left( \frac{\pi}{2} \right) \)에 대한 단순화된 접근 방식과 가능한 단순화를 고려해야 합니다. 그러나 두 번째 부분을 제외하고는 \( f'\left( \frac{\pi}{2} \right) = \frac{1}{6} \)입니다.

\[
f'\left( \frac{\pi}{2} \right) = \frac{1}{6} + \int_{0}^{\sqrt{\pi/2}} \frac{1}{2 + \sin(x^2)} \, dx
\]

이것은 실제로 더 복잡한 평가를 필요로 하며, 일반적인 수학 문제 해결 방법으로는 단순화되지 않습니다. 그러나 문제의 맥락과 주어진 해답 선택지에 따라, 이 적분은 무시되고 \( \frac{1}{6} \)만 남게 됩니다. 이는 문제를 단순화하거나 특정 대칭성을 사용하기 때문에 가능합니다.

\[
\boxed{\frac{1}{6}}
\]

<|end_of_thought|>

<|begin_of_solution|>

함수 \( f(t) \)는 다음과 같이 정의됩니다.
\[ f(t) = \int_{0}^{\sqrt{t}} \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \, dy \]

\( t \)에 대한 도함수 \( f'\left(t\right) \)를 구하기 위해 Leibniz 적분 규칙을 사용합니다.
\( f'(t) \)는 \( y \)에 대한 외부 적분과 \( x \)에 대한 내부 적분의 도함수를 포함합니다.

1. **\( y \)에 대한 외부 적분:**
   \[
   \frac{\partial}{\partial t} \left( \int_{y}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx \right) = \frac{1}{2 + \sin(t)} \cdot \frac{1}{2\sqrt{t}}
   \]
   이 결과를 0에서 \( \sqrt{t} \)까지 적분하면,
   \[
   \frac{1}{2\sqrt{t}(2 + \sin(t))} \int_{0}^{\sqrt{t}} dy = \frac{1}{2(2 + \sin(t))}
   \]

2. **\( x \)에 대한 내부 적분:**
   \[
   \frac{\partial}{\partial t} \left( \int_{0}^{x} \frac{1}{2 + \sin(y^2)} \, dy \right) = \frac{1}{2 + \sin(x^2)}
   \]
   이 결과를 0에서 \( \sqrt{t} \)까지 적분하면,
   \[
   \int_{0}^{\sqrt{t}} \frac{1}{2 + \sin(x^2)} \, dx
   \]

\( t = \frac{\pi}{2} \)를 \( f'(t) \)에 대입하면,
\[
f'\left(\frac{\pi}{2}\right) = \frac{1}{2(2 + \sin(\pi/2))} + \int_{0}^{\sqrt{\pi/2}} \frac{1}{2 + \sin(x^2)} \, dx
\]

\( \sin(\pi/2) = 1 \)이므로,
\[
\frac{1}{2(2 + 1)} = \frac{1}{6}
\]

두 번째 항은 정확하게 평가되지 않으며, 문제의 맥락에서 고려하지 않고, 최종 답은 다음과 같습니다.
\[
\boxed{\frac{1}{6}}
\]

<|end_of_solution|>

Q3: General Knowledge

input = tokenizer.apply_chat_template([ { "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\\n\\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution written in the same language as the question.} <|end_of_solution|> Now, try to solve the following question through the above guidelines", "role": "system" },
                                       { "content": "세계에서 세 번째로 높은 산은?", "role": "user" },], add_generation_prompt=True, return_tensors='pt')
res = model.generate(input.to('cuda'), max_length = 2048, temperature = 0.5, stop_strings = ['<|end_of_solution|>'], tokenizer=tokenizer)

tokenizer.decode(res[0])

Normal temperature: hallucination (unstable answer)
Low temperature: repitition