|
--- |
|
license: other |
|
license_name: deepnight-responsible-ai |
|
license_link: LICENSE |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- 600B |
|
- Python |
|
- Code |
|
- Logical Understanding |
|
- Relation Establishment |
|
- Translation |
|
- ai1 |
|
- DEEPNIGHT |
|
--- |
|
<div style="display: flex; justify-content: center; align-items: center;"> |
|
<img src="./cover.jpg" style="width: 100%; max-width: 350px; height: auto;"/></div> |
|
|
|
# DEEPNIGHT ai1 |
|
The 600 Billion+ Parameter Model. |
|
Yes! We did this! |
|
|
|
The second largest model in the world, right after GPT-4. |
|
|
|
--- |
|
|
|
We at [DEEPNIGHT](https://deepnight.tech) have been working on this for quite some time. |
|
We have successfully built the second largest model called ai1 which comes with 600 Billion+ parameters. |
|
|
|
`ai1` can perform as good as GPT-4 and has a context-window of 8k tokens. |
|
ai1 was trained with a new approach where after training the model on a corpus of text from various sources including |
|
but not limited to: |
|
- RefinedWeb |
|
- Opensource code from GitHub |
|
- Common Crawl |
|
we fine-tuned the model on a huge dataset (generated manually and with automation) for logical understanding and reasoning. |
|
We also trained the model for function calling capabilities. |
|
|
|
--- |
|
|
|
## What is special about ai1? |
|
ai1 works on a chaining methodology which is built-in. When it receives an input from the user, it tries to understand the input |
|
before acting on generation. It generates an instruction-based prompt internally and then works on generation of the response. |
|
Benefit of this? <b>We'll just say the jobs of Prompt Engineering are over.</b> |
|
|
|
Unlike ChatGPT, GPT-4, Llama, and other models, ai1 doesn't require heavy prompt engineering to provide answers. |
|
The understanding-development phase in the model takes care of that. |
|
|
|
What else? |
|
- performs as good as GPT-4 |
|
- excels in automation tasks |
|
- can predict emotions of the user by the conversation (while understanding the input in Phase-1) |
|
resulting in better and curated generations. |
|
- has an understanding towards human-emotions which helps the model curate the content accordingly |
|
- excels in roleplays |
|
- excels in writing code |
|
- the model has a few global memory units which are used to store data away from the context-window. |
|
These memory units are mostly used to store the function schemas but in the end the model decides itself what to store in them. |
|
- if we consider how much would it cost, well, on an average $0.005 per 1000 tokens. |
|
|
|
--- |
|
|
|
## Future goals |
|
We don't discuss that. Specially after seeing how SOME AI COMPANY ON THEIR DEV DAY just used the opensource research and publications |
|
to profit themselves... Hah. |
|
|
|
--- |
|
|
|
## Are we going to allow access? |
|
Not for some time. We are still running evaluations and have a lot to learn about how this model can be made better. |
|
|
|
--- |
|
|
|
Feel free to reach out to us at [email protected] |
|
|
|
- Team [DEEPNIGHT](https://deepnight.tech) |