Going multimodal: How Prezi is leveraging the Hub and the Expert Support Program to accelerate their ML roadmap
•
11
transformers.agents
is a bit tedious to use, because it goes in great detail.Here is the full thesis if you're interested: https://research.vu.nl/ws/portalfiles/portal/355675396/dissertationlaurerfinal+-+66c885c7e9d0b.pdf
Here is the collection of my most recent models: https://huggingface.co/collections/MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
#!pip install "huggingface_hub>=0.25.0"
from huggingface_hub import InferenceClient
client = InferenceClient(
base_url="https://huggingface.co/api/integrations/dgx/v1",
api_key="MY_FINEGRAINED_ENTERPRISE_ORG_TOKEN" # see docs: https://huggingface.co/blog/inference-dgx-cloud#create-a-fine-grained-token
)
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-405B-Instruct-FP8",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
],
max_tokens=1024,
)
print(output)
window.ai
feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser!@HAMRONI can you share the full inference code that caused this error? you can open a discussion in the model repo
top_k
arbitrarily discarding high-quality continuations? Or top_p
forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p
flag in generate
, fresh from a PR merged today! 🥬min_p
flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?min_p
to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.