John Smith's picture

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

Organizations

open/ acc's profile picture FashionStash Group meeting's profile picture

John6666's activity

reacted to Elizezen's post with ๐Ÿ‘€ about 3 hours ago
view post
Post
2810
It turned out that the following simple method seems to be actually effective when you want to increase the appearance probability of only one or a very limited number of tokens.

import os

one_token = "โ™ก" # Token to increase the appearance probability
value = 1000000

token = one_token * value

with open("one-token.txt", "w", encoding="utf-8") as f:
    f.write(token)


By training LoRA with unsloth based on the .txt file generated by the code above, you can increase the appearance probability of specific tokens while maintaining the model's performance to great extent. However, it's better to stop the training before train loss becomes 0.0, as it will start spamming the token once it appears even once. In general, you can stop training at a very early stage and it will still work.

It is also possible to reduce the appearance probability of specific tokens by creating an over-learned LoRA with the specific tokens you want to reduce, combining it with the model, and then creating a model that extracts only the difference using the chat vector method and subtracting it from an arbitrary model.

In this case, it is better to set the ratio of chat vector to about five times. It has very little effect on the overall performance, apart from the specific tokens.

new_v = v - (5.0 * chat_vector[i].to(v.device))
reacted to AdinaY's post with ๐Ÿ”ฅ about 7 hours ago
view post
Post
535
QvQ-72B-Preview๐ŸŽ„ an open weight model for visual reasoning just released by Alibaba_Qwen team
Qwen/qvq-676448c820912236342b9888
โœจ Combines visual understanding & language reasoning.
โœจ Scores 70.3 on MMMU
โœจ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
reacted to merve's post with ๐Ÿ‘ about 7 hours ago
reacted to as-cle-bert's post with โค๏ธ about 7 hours ago
view post
Post
262
Hi HuggingFacers!๐Ÿคถ๐Ÿผ

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons๐Ÿฆ‹

GitHub ๐Ÿ‘‰ https://github.com/AstraBert/Pokemon-Bot
Demo Space ๐Ÿ‘‰ as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace๐Ÿค—, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!๐Ÿ•
reacted to MonsterMMORPG's post with ๐Ÿ‘ about 7 hours ago
view post
Post
384
Best open source Image to Video CogVideoX1.5-5B-I2V is pretty decent and optimized for low VRAM machines with high resolution - native resolution is 1360px and up to 10 seconds 161 frames - audios generated with new open source audio model

Full YouTube tutorial for CogVideoX1.5-5B-I2V : https://youtu.be/5UCkMzP2VLE

1-Click Windows, RunPod and Massed Compute installers : https://www.patreon.com/posts/112848192

https://www.patreon.com/posts/112848192 - installs into Python 3.11 VENV

Official Hugging Face repo of CogVideoX1.5-5B-I2V : THUDM/CogVideoX1.5-5B-I2V

Official github repo : https://github.com/THUDM/CogVideo

Used prompts to generate videos txt file : https://gist.github.com/FurkanGozukara/471db7b987ab8d9877790358c126ac05

Demo images shared in : https://www.patreon.com/posts/112848192

I used 1360x768px images at 16 FPS and 81 frames = 5 seconds

+1 frame coming from initial image

Also I have enabled all the optimizations shared on Hugging Face

pipe.enable_sequential_cpu_offload()

pipe.vae.enable_slicing()

pipe.vae.enable_tiling()

quantization = int8_weight_only - you need TorchAO and DeepSpeed works great on Windows with Python 3.11 VENV

Used audio model : https://github.com/hkchengrex/MMAudio

1-Click Windows, RunPod and Massed Compute Installers for MMAudio : https://www.patreon.com/posts/117990364

https://www.patreon.com/posts/117990364 - Installs into Python 3.10 VENV

Used very simple prompts - it fails when there is human in input video so use text to audio in such cases

I also tested some VRAM usages for CogVideoX1.5-5B-I2V

Resolutions and here their VRAM requirements - may work on lower VRAM GPUs too but slower

512x288 - 41 frames : 7700 MB , 576x320 - 41 frames : 7900 MB

576x320 - 81 frames : 8850 MB , 704x384 - 81 frames : 8950 MB

768x432 - 81 frames : 10600 MB , 896x496 - 81 frames : 12050 MB

896x496 - 81 frames : 12050 MB , 960x528 - 81 frames : 12850 MB




  • 1 reply
ยท
reacted to AlexBodner's post with ๐Ÿ‘€ about 16 hours ago
view post
Post
209

๐Ÿš€๐Ÿค–๐ƒ๐จ ๐€๐ง๐๐ซ๐จ๐ข๐๐ฌ ๐ƒ๐ซ๐ž๐š๐ฆ ๐จ๐Ÿ ๐„๐ฅ๐ž๐œ๐ญ๐ซ๐ข๐œ ๐Œ๐š๐ซ๐ข๐จ๐ฌ?

Discover how we replaced the classic game engine with DIAMOND, a Neural Network that predicts every frame based on actions, noise, and past states. From training on human and RL gameplay to generating surreal hallucinations, this project shows the potential of diffusion models in creating amazing simulations. ๐ŸŽฎ

๐Ÿงต Dive into the full story in our Twitter thread:
๐Ÿ‘‰ https://x.com/AlexBodner_/status/1871566560512643567
๐ŸŒŸ Donโ€™t forget to follow and leave a star for more groundbreaking AI projects!
reacted to hba123's post with ๐Ÿš€ about 16 hours ago
view post
Post
587
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!

I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.


Check it out: https://huggingface.co/blog/hba123/derivingdpo
reacted to DawnC's post with โค๏ธ about 20 hours ago
view post
Post
744
๐ŸŒŸ PawMatchAI: Making Breed Selection More Intuitive! ๐Ÿ•
Excited to share the latest update to this AI-powered companion for finding your perfect furry friend! The breed recommendation system just got a visual upgrade to help you make better decisions.

โœจ What's New?
Enhanced breed recognition accuracy through strategic model improvements:
- Upgraded to a fine-tuned ConvNeXt architecture for superior feature extraction
- Implemented progressive layer unfreezing during training
- Optimized data augmentation pipeline for better generalization
- Achieved 8% improvement in breed classification accuracy

๐ŸŽฏ Key Features:
- Smart breed recognition powered by AI
- Visual matching scores with intuitive color indicators
- Detailed breed comparisons with interactive tooltips
- Lifestyle-based recommendations tailored to your needs

๐Ÿ’ญ Project Vision
Combining my passion for AI and pets, this project represents another step toward my goal of creating meaningful AI applications. Each update aims to make the breed selection process more accessible while improving the underlying technology.

๐Ÿ‘‰ Try it now: DawnC/PawMatchAI

Your likes โค๏ธ on this space fuel this project's growth!

#AI #MachineLearning #DeepLearning #Pytorch #ComputerVision
See translation
reacted to ginipick's post with ๐Ÿš€๐Ÿ”ฅ about 20 hours ago
view post
Post
1409
๐ŸŽจ GiniGen Canvas-o3: Intelligent AI-Powered Image Editing Platform
Transform your images with precision using our next-generation tool that lets you extract anything from text to objects with simple natural language commands! ๐Ÿš€
๐Ÿ“Œ Key Differentiators:

Intelligent Object Recognition & Extraction
โ€ข Freedom to select any target (text, logos, objects)
โ€ข Simple extraction via natural language commands ("dog", "signboard", "text")
โ€ข Ultra-precise segmentation powered by GroundingDINO + SAM
Advanced Background Processing
โ€ข AI-generated custom backgrounds for extracted objects
โ€ข Intuitive object size/position adjustment
โ€ข Multiple aspect ratio support (1:1, 16:9, 9:16, 4:3)
Progressive Text Integration
โ€ข Dual text placement: over or behind images
โ€ข Multi-language font support
โ€ข Real-time font style/size/color/opacity adjustment

๐ŸŽฏ Use Cases:

Extract logos from product images
Isolate text from signboards
Select specific objects from scenes
Combine extracted objects with new backgrounds
Layer text in front of or behind images

๐Ÿ’ซ Technical Features:

Natural language-based object detection
Real-time image processing
GPU acceleration & memory optimization
User-friendly interface

๐ŸŽ‰ Key Benefits:

User Simplicity: Natural language commands for object extraction
High Precision: AI-powered accurate object recognition
Versatility: From basic editing to advanced content creation
Real-Time Processing: Instant result visualization

Experience the new paradigm of image editing with GiniGen Canvas-o3:

Seamless integration of multiple editing functions
Professional-grade results with consumer-grade ease
Perfect for social media, e-commerce, and design professionals

Whether you're extracting text from complex backgrounds or creating sophisticated visual content, GiniGen Canvas-o3 provides the precision and flexibility you need for modern image editing!

GO! ginigen/CANVAS-o3
  • 2 replies
ยท
reacted to sayakpaul's post with ๐Ÿš€๐Ÿ”ฅ 1 day ago
view post
Post
1662
Commits speak louder than words ๐Ÿคช

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release ๐Ÿค—
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
reacted to singhsidhukuldeep's post with ๐Ÿ”ฅ 1 day ago
view post
Post
1226
Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

๐Ÿš€ Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512ร—512 pixels with 14ร—14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

โšก๏ธ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224โ†’384โ†’512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

๐Ÿ“Š Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

๐ŸŽฏ Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!
reacted to Kseniase's post with ๐Ÿ‘ 1 day ago
view post
Post
1911
**15 Agentic Systems and Frameworks of 2024**

This year, we started our โ€œAI Agents and Agentic Workflowsโ€ series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them.
The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs โ€“ we will be covering that for you. Now, letโ€™s review the agentic systems that have emerged this year.

Here is a list of 15 agentic systems and frameworks of 2024:

1. GUI Agents: A Survey (2412.13501)

2. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2411.03562)

3. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)

4. MALT: Improving Reasoning with Multi-Agent LLM Training (2412.01928)

5. Agent S: An Open Agentic Framework that Uses Computers Like a Human (2410.08164)

6. Automated Design of Agentic Systems (2408.08435)

7. AgentInstruct: Toward Generative Teaching with Agentic Flows (2407.03502)

8. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant (2410.18603)

9. WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents (2410.07484)

10. Generative Agent Simulations of 1,000 People (2411.10109)

11. DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747)

12. PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking (2410.12375)

13. Generative World Explorer (2411.11844)

14. Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines (2412.14684)

15. AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions (2410.20424)

Thanks for reading Turing Post!
Subscribe to receive new posts straight into your inbox -> https://www.turingpost.com/subscribe
reacted to DualityAI-RebekahBogdanoff's post with ๐Ÿ‘€ 1 day ago
view post
Post
364
Hi again! ๐Ÿ‘‹ Duality.ai just launched a lesson on how to bring your own twin into our free FalconEditor software๐Ÿ–ฅ๏ธ, how to create a synthetic dataset using your twin๐Ÿ“ธ, and how to test your model with your own images๐ŸŽฏ!
https://falcon.duality.ai/secure/documentation/ex2-adv-find-twin?sidebarMode=learn

This is crucial for anyone wanting to use FalconEditor for their own projects. We will also be hosting a free photogrammetry course that uses a free workflow, independent of OS specifications, to create robust digital twins. These 2 lessons complement each other incredibly well. Sign up for the course here!
https://docs.google.com/forms/d/e/1FAIpQLSd2WsKaa1CjRM89uv3LNkZXj1TUNWrNxDrtyWny2w1OQDHn8g/viewform
reacted to nyuuzyou's post with ๐Ÿ‘ 1 day ago
view post
Post
833
๐ŸŽฎ GoodGame.ru Clips Dataset - nyuuzyou/goodgame

A collection of 39,280 video clips metadata from GoodGame.ru streaming platform featuring:

- Complete clip information including direct video URLs and thumbnails
- Streamer details like usernames and avatars
- Engagement metrics such as view counts
- Game categories and content classifications
- Released under Creative Commons Zero (CC0) license

This extensive clips collection provides a valuable resource for developing and evaluating video-based AI applications, especially in Russian gaming and streaming contexts.
replied to nroggendorff's post 1 day ago
view reply

Maybe you're the type who's always working your brain, so I think it would be good to rest your brain by watching something moderately interesting. If you're not doing anything, you'll end up thinking about things and that will tire you out even more.๐Ÿฅถ

replied to nroggendorff's post 1 day ago
reacted to nroggendorff's post with ๐Ÿ˜” 1 day ago
view post
Post
2414
im so tired
  • 3 replies
ยท
reacted to randomhex10101's post with ๐Ÿ‘€ 2 days ago
view post
Post
412
Does anyone have any suggestions when it comes to multi-user managment for dedicated custom endpoints? Would it be possible to utilize Webhooks with HF in order to automate and delegate custom dedicated endpoint creating for each user who signs up for my service? Or would this be implausible / too complex over something like simply allocating containers? Or is there a better way to go about this that maybe I just haven't found yet from the docs? Any help will mean a lot, thank you!