Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,35 @@ Success is a game of winners.
|
|
20 |
— # Leroy Dyer (1972-Present)
|
21 |
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
|
22 |
|
23 |
-
# The Human AI .
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
# Deep Reasoner Model
|
27 |
|
|
|
20 |
— # Leroy Dyer (1972-Present)
|
21 |
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/>
|
22 |
|
23 |
+
# The Human AI . - Current model --
|
24 |
+
This model i can say ... wow ... !! ... GRPO ! It works ... I think the difference is not the output, As we have seen with the spydazWeb ai Models , I have played with varioius styles of outputs, as wellas prompting styles , trained on various tasks and functions:
|
25 |
+
BUT :
|
26 |
+
The model did not seem to be advancing , although when used as a agent it highly performs its task without the need of other models to be called : As it has been trained for each agent role:
|
27 |
+
## Agent Roles
|
28 |
+
**Planning**
|
29 |
+
**Coding**
|
30 |
+
**ToolUse**
|
31 |
+
**WebSearch**
|
32 |
+
**Reflection**
|
33 |
+
**Explanation and reasoning**
|
34 |
+
**TASKS**
|
35 |
+
**Output Formatting**
|
36 |
+
**Content Recall**
|
37 |
+
**Deep Research**
|
38 |
+
**Deep Calculation**
|
39 |
+
**repl**
|
40 |
+
|
41 |
+
Etc the list goes on !
|
42 |
+
|
43 |
+
We also trainwed various scripts and personas as well as merged a few roleplay models.. Whilst Still Targetting benchmarks ! But our model improved locally but not for benchmarking !!
|
44 |
+
|
45 |
+
|
46 |
+
**GRPO**
|
47 |
+
This has enabled the model to generate internal chains of thoughts for previoulsly trained data, so the data which had been previously trained was re-trained and the network created chains of thoughts for problems which previlusly had no explanation for the results:
|
48 |
+
so now we can train the benchmark datsets and find these issues :
|
49 |
+
So the model has also become its own inteligence: learning untrained things:
|
50 |
+
The take away also is formatting and the use of structured outputs in your responses, hence think tags and reasoning tags or planning tags and explanation tags are also a way to normalise the outputs to always apear in this way !
|
51 |
+
This also gives us something to penalzie or reward : Some tasks do not require reasoning or explanation but still may require planning etc .. so these things can be generated as well as trained. hence they can be retrained with rewards enabling for the task to be fully formatted as fully performed in the agentec technique, such as plan,think,observe, act, observe reflect etc ....
|
52 |
|
53 |
# Deep Reasoner Model
|
54 |
|