deepnight-research
/

saily_100b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

saily_100b / README.md

deepnight-research's picture

deepnight-research

Update README.md

5ace421 12 months ago

|

1.69 kB

	---
	license: mit
	license_name: deepnight-responsible-ai
	license_link: LICENSE
	---

	# SaiLy 100B (deepnight-research/saily_100B)
	<img src="https://i.ibb.co/TvZQjZM/Leonardo-Diffusion-XL-Furious-and-strong-Elephant-and-anchor-l-1.jpg" alt="Saily: Experimental AI Models by DEEPNIGHT">

	---
	### SaiLy is a series/collection of AI Models by DEEPNIGHT-RESEARCH which are highly experimental and uncensored. Please use with responsibility.
	---
	<br>
	Prompt Template: Alpeca

	```
	Below is an instruction that describes a task. Write a response that appropriately completes the request.
	### Instruction:
	{prompt}
	### Response:
	```

	### Description:
	This is the first stable model of the series. The model is based on Llama2-chat.

	---

	### Did some said CODE?
	Here you go!
	```python
	import transformers
	model = transformers.AutoModelForCausalLM.from_pretrained(
	'deepnight-research/saily_100B'
	)
	```

	To use the optimized triton implementation of FlashAttention, you can load the model on GPU ```(cuda:0)``` with ```attn_impl='triton'``` and with ```bfloat16``` precision:
	```python
	import torch
	import transformers

	name = 'deepnight-research/saily_100B'

	config = transformers.AutoConfig.from_pretrained(name)
	config.attn_config['attn_impl'] = 'triton'
	config.init_device = 'cuda:0' # For fast initialization directly on GPU!

	model = transformers.AutoModelForCausalLM.from_pretrained(
	name,
	config=config,
	torch_dtype=torch.bfloat16, # Load model weights in bfloat16
	trust_remote_code=True
	)

	```
	---

	If you would like to support us, please consider donating for [#aiforcause](https://github.com/deepnight-ai/aiforcause).

	Cheers✌️
	- Team [DEEPNIGHT](https://deepnight.tech)