damerajee (dame rajee)

reacted to ginipick's post with 😎🤗👀🚀🔥 21 days ago

Post

5875

Gini's AI Spaces: Everything You Need for Visual Content Creation!

Hello! ✨ Let me introduce Gini’s 5 AI Spaces that effortlessly generate various styles of visual content.

Each Space leverages Diffusers and Gradio, so you can create stunning images in just a few clicks!

1) Flowchart
Features: Hand-drawn style flowcharts for workflows or business processes
Use Cases: Software release pipelines, data pipelines, corporate workflows
Benefits: Clear stage-by-stage structure, simple icon usage

ginigen/Flowchart

2) Infographic
Features: Visually appealing infographics that communicate data or statistics
Use Cases: Global energy charts, startup growth metrics, health tips and more
Benefits: Eye-catching icons and layouts, perfect for storytelling at a glance

ginigen/Infographic

3) Mockup
Features: Sketch-style wireframes or UX mockups for apps/websites
Use Cases: Mobile login flows, dashboards, e-commerce site layouts
Benefits: Rapid prototyping of early design ideas, perfect for storyboarding

ginigen/Mockup

4) Diagram
Features: Educational diagrams (science, biology, geography, etc.)
Use Cases: Water cycle, photosynthesis, chemical reactions, human anatomy
Benefits: Vibrant, friendly illustrations, ideal for student-friendly materials

ginigen/Diagram

5) Design
Features: Product/industrial design concepts (coffee machines, smartphones, etc.)
Use Cases: Prototyping, concept car interiors, high-tech product sketches
Benefits: From 3D render-like visuals to simple sketches, unleash your creativity!

ginigen/Design

Click any link above and let AI spark your imagination. Enjoy a fun and productive creative process! 🚀✨

reacted to Tonic's post with 🔥 about 1 month ago

Post

2348

🙋🏻‍♂️hey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !

reacted to lewtun's post with 🔥🤗🚀 about 1 month ago

Post

10223

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1

5 replies

·

reacted to danielhanchen's post with 🤗👍 3 months ago

Post

1546

I uploaded GGUFs, 4bit bitsandbytes and full 16bit precision weights for Llama 3.3 70B Instruct are here: unsloth/llama-33-all-versions-67535d7d994794b9d7cf5e9f

You can also finetune Llama 3.3 70B in under 48GB of VRAM with Unsloth!
GGUFs: unsloth/Llama-3.3-70B-Instruct-GGUF
BnB 4bit: unsloth/Llama-3.3-70B-Instruct-bnb-4bit
16bit: unsloth/Llama-3.3-70B-Instruct

1 reply

·

reacted to mervenoyan's post with 🔥 5 months ago

Post

2316

we have a leaderboard for video LLMs, and most of the top models are open ones! opencompass/openvlm_video_leaderboard 👑👏
we are so back 🔥

posted an update 5 months ago

Post

493

On the 2nd of October a really cool paper was released called "Were RNNs all we need" https://arxiv.org/abs/2410.01201

This paper introduces the MinGRU model, a simplified version of the traditional Gated Recurrent Unit (GRU) designed to enhance efficiency by removing hidden state dependencies from its gates. This allows for parallel training, making it significantly faster than conventional GRUs. Additionally, MinGRU eliminates non-linear activations like tanh, streamlining computations.

So I read the paper and I tried training this model and it seems to be doing quite well , you could check out the pre-trained model on the huggingface spaces

- damerajee/mingru-stories

1 reply

·

reacted to onekq's post with 🧠 5 months ago

Post

2571

Here is my latest study on OpenAI🍓o1🍓.
A Case Study of Web App Coding with OpenAI Reasoning Models (2409.13773)

I wrote an easy-to-read blogpost to explain finding.
https://huggingface.co/blog/onekq/daily-software-engineering-work-reasoning-models

INSTRUCTION FOLLOWING is the key.

100% instruction following + Reasoning = new SOTA

But if the model misses or misunderstands one instruction, it can perform far worse than non-reasoning models.

replied to reach-vb's post 6 months ago

Another win for Open-source 😤

reacted to reach-vb's post with 🔥🧠 6 months ago

Post

2902

Less than two days ago Kyutai Labs open sourced Moshi - an ~7.6B on-device Speech to Speech foundation model and Mimi - SoTA streaming speech codec! 🔥

The release includes:

1. Moshiko & Moshika - Moshi finetuned on synthetic data (CC-BY license) ( kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd)
2. Mimi - Streaiming Audio Codec, processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps (CC-BY license) ( kyutai/mimi)
3. Model checkpoints & Inference codebase written in Rust (Candle), PyTorch & MLX (Apache license) (https://github.com/kyutai-labs/moshi)

How does Moshi work?

1. Moshi processes two audio streams: one for itself and one for the user, with the user's stream coming from audio input and Moshi's stream generated by the model.

2. Along with these audio streams, Moshi predicts text tokens for its speech, enhancing its generation quality.

3. The model uses a small Depth Transformer for codebook dependencies and a large 7B parameter Temporal Transformer for temporal dependencies.

4. The theoretical latency is 160ms, with a practical latency of around 200ms on an L4 GPU.

Model size & inference:

Moshiko/ka are 7.69B param models

bf16 ~16GB VRAM
8-bit ~8GB VRAM
4-bit ~4GB VRAM

You can run inference via Candle 🦀, PyTorch and MLX - based on your hardware.

The Kyutai team, @adefossez @lmz and team are cracked AF, they're bringing some serious firepower to the open source/ science AI scene, looking forward to what's next! 🐐

1 reply

·

reacted to MohamedRashad's post with ❤️ 6 months ago

Post

3461

For all the Muslims out there who are interested in Quran and its tafsir (explanations). This humble dataset consists of 84 different books of tafsir for nearly all the ayat in the Quran:
MohamedRashad/Quran-Tafseer

I hope it helps someone to build something nice and useful with it ^_^

reacted to merve's post with 🚀👍 6 months ago

Post

2395

NVIDIA just dropped NVEagle 🦅

Super impressive vision language model that comes in 7B, 13B and 13B fine-tuned on chat 💬
Model repositories: merve/nveagle-66d0705108582d73bb235c26
Try it: NVEagle/Eagle-X5-13B-Chat 💬 (works very well! 🤯)

This model essentially explores having different experts (MoE) for image encoder part of vision language model.
How? 🧐
The authors concatenate the vision encoder output tokens together, and they apply "pre-alignment" essentially fine-tune experts with frozen text encoder.

Then they freeze both experts and the decoder and just train the projection layer, and finally, they unfreeze everything for supervised fine-tuning ✨

In the paper, they explore different fusion strategies and vision encoders, extending basic CLIP encoder, and figure out simply concatenating visual tokens works well.
Rest of the architecture is quite similar to LLaVA. (see below the architecture)

dame rajee

AI & ML interests

Recent Activity

Organizations

damerajee's activity