Sorokin Evgeny
DeathGodlike
AI & ML interests
None yet
Recent Activity
reacted
to
alibabasglab's
post
with π
5 days ago
ClearerVoice-Studio: your one-step speech processing platform for speech enhancement, speech separation, speech super-resolution, and audio-visual target speaker extraction. Say goodbye to noise and hello to clarity!
Online demo: https://huggingface.co/spaces/alibabasglab/ClearVoice .
Github repo: https://github.com/modelscope/ClearerVoice-Studio
Organizations
None yet
DeathGodlike's activity
reacted to
onekq's
post with π₯
5 days ago
reacted to
alibabasglab's
post with π
5 days ago
Post
1687
ClearerVoice-Studio: your one-step speech processing platform for speech enhancement, speech separation, speech super-resolution, and audio-visual target speaker extraction. Say goodbye to noise and hello to clarity!
Online demo: alibabasglab/ClearVoice .
Github repo: https://github.com/modelscope/ClearerVoice-Studio
Online demo: alibabasglab/ClearVoice .
Github repo: https://github.com/modelscope/ClearerVoice-Studio
reacted to
tomaarsen's
post with π₯β€οΈ
11 days ago
Post
4338
ποΈ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.
We apply our recipe to train 2 Static Embedding models that we release today! We release:
2οΈβ£ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
π§ my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
π my training scripts, using the Sentence Transformers library
π my Weights & Biases reports with losses & metrics
π my list of 30 training and 13 evaluation datasets
The 2 Static Embedding models have the following properties:
ποΈ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0οΈβ£ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
π No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
π Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πͺ Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)
Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings
The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.
Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
We apply our recipe to train 2 Static Embedding models that we release today! We release:
2οΈβ£ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
π§ my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
π my training scripts, using the Sentence Transformers library
π my Weights & Biases reports with losses & metrics
π my list of 30 training and 13 evaluation datasets
The 2 Static Embedding models have the following properties:
ποΈ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0οΈβ£ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
π No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
π Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πͺ Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)
Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings
The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.
Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
reacted to
prithivMLmods's
post with ππ€
about 2 months ago
Post
2649
Milestone for Flux.1 Dev π₯
π’The Flux.1 Dev model has crossed 1οΈβ£0οΈβ£,0οΈβ£0οΈβ£0οΈβ£ creative public adapters! π
π https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev
π’This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges
π’ Hereβs the 10,000th public adapter : π
+ strangerzonehf/Flux-3DXL-Partfile-0006
π’ Page :
+ https://huggingface.co/strangerzonehf
π’ Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
π’The Flux.1 Dev model has crossed 1οΈβ£0οΈβ£,0οΈβ£0οΈβ£0οΈβ£ creative public adapters! π
π https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev
π’This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges
π’ Hereβs the 10,000th public adapter : π
+ strangerzonehf/Flux-3DXL-Partfile-0006
π’ Page :
+ https://huggingface.co/strangerzonehf
π’ Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
reacted to
openfree's
post with π
3 months ago
Post
3988
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:
Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.
Main applications of MixGen3:
Content creation
Design and illustration
Marketing and advertising
Education and learning
Value of MixGen3:
Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development
Expected effects:
Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity
MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space
contacts: [email protected]
Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.
Main applications of MixGen3:
Content creation
Design and illustration
Marketing and advertising
Education and learning
Value of MixGen3:
Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development
Expected effects:
Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity
MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space
contacts: [email protected]
reacted to
singhsidhukuldeep's
post with π
3 months ago
Post
2163
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."
The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.
The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.
How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar Ξ» for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.
What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.
Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.
This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.
The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.
How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar Ξ» for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.
What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.
Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.
This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
reacted to
Felladrin's
post with π
3 months ago
Post
2989
MiniSearch is celebrating its 1st birthday! π
Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!
HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!
HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
reacted to
nyuuzyou's
post with β€οΈπ
4 months ago
Post
1973
π Introducing Doc4web.ru Documents Dataset -
nyuuzyou/doc4web
Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use
The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use
The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
reacted to
m-ric's
post with π₯
4 months ago
Post
1332
π¨π³β΅οΈ εΊζ΅·: Chinese AI is expanding globally
Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.
Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!
My key takeaways:
Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!
π° But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.
π On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.
πΈ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.
π± To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.
βοΈ They also try to drill down specific industries: but these niches are also getting crowded.
β‘οΈ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.
There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!
Read her post πΒ https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.
Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!
My key takeaways:
Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!
π° But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.
π On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.
πΈ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.
π± To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.
βοΈ They also try to drill down specific industries: but these niches are also getting crowded.
β‘οΈ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.
There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!
Read her post πΒ https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
reacted to
TuringsSolutions's
post with ππ
4 months ago
Post
3195
I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video: https://youtu.be/41dF0yoz0qo
Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.
Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.
reacted to
bartowski's
post with π
5 months ago
Post
6251
As some of you know, I try to convert models to either fp32 or bf16 depending on theirs size before doing imatrix and quantization
Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed
My setup:
Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M
I calculated the kld base from the FP32 model:
then calculated the divergence itself for each like so:
Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not
Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.
How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
Today I decided to see if that matters, and the results have me.. for lack of a better word, perplexed
My setup:
Mistral Nemo Instruct 2407
- convert to FP32, calculate imatrix, quantize to Q8_0 and Q4_K_M
- convert to FP16, calculate imatrix, quantize to Q8_0 and Q4_K_M
I calculated the kld base from the FP32 model:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-f32.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld -ngl 35 -fa -sm row
then calculated the divergence itself for each like so:
./llama-perplexity -m /models/Mistral-Nemo-Instruct-2407-Q8_0.gguf -f /training_data/wikitext-2-raw/wiki.test.raw --kl-divergence-base /training_data/mistral-nemo-f32.kld --kl-divergence -ngl 50 -fa -sm row
Q4_K_M from fp16 and fp32 were similar, trading blows across statistics, odd since i expected fp32 to be strictly better but it's not
Q8_0 is where things get weird. Despite each file being slightly different size, and the sha256sum of course being different, they each get *completely identical* scores, down to 6 decimal places of precision on the statistics.
How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
reacted to
nyuuzyou's
post with π₯
7 months ago
Post
1114
Just released the GitVerse Code Dataset -
nyuuzyou/gitverse-code.
π Dataset highlights:
- 30 GB of unique code extracted from over 400 GB of analyzed data
- 9,014 repositories
- 2,804,216 unique code files
- 419 different file types
- Multilingual: various programming languages
π Sourced from GitVerse, a Russian GitHub alternative opened in 2024.
Let me know your thoughts.
π Dataset highlights:
- 30 GB of unique code extracted from over 400 GB of analyzed data
- 9,014 repositories
- 2,804,216 unique code files
- 419 different file types
- Multilingual: various programming languages
π Sourced from GitVerse, a Russian GitHub alternative opened in 2024.
Let me know your thoughts.
reacted to
mitkox's
post with π₯
8 months ago
Post
2379
Me: I want on device AI: fast, without latency, with real privacy, convenient for use and development.
Microsoft: The best I can do is Copilot+. You need a special Qualcomm chip and Windows 11 24H2. Today I can give you only Recall, taking screenshots and running a visual model to write context about what you are doing in the unencrypted Semantic Index database for embeddings. I'm giving you SLMs Phi Silica, accessible only via API and SDK. In the autumn I can give you the developer tools for C#/C++ and you can use them.
Apple: The best I can do is Apple Intelligence. You need a special Apple chip and macOS 15. Today I can give you only marketing. In the autumn I can give you on-device 3B quantized to 3.5bit mysterious SLMs and diffusion models with LoRA adapters. We will have an encrypted Semantic Index database for embeddings and agentic flows with function calling. We will call all of them with different names. In the autumn I will give you the developer tools in Swift and you can use them.
Open Source: The best I can do is llama.cpp. You can run it on any chip and OS. Today you can run AI inferencing on device and add other open source components for your solution. I can give you local AI models SLMs/LLMs - from wqen2-0.5B to Llama3-70B. You can have an encrypted local embeddings database with PostgreSQL/pgvector or SQLite-Vec. I can give you a wide choice of integrations and open-source components for your solution- from UIs to agentic workflows with function calling. Today I can give you the developer tools in Python/C/C++/Rust/Go/Node.js/JS/C#/Scala/Java and you can use them.
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
Microsoft: The best I can do is Copilot+. You need a special Qualcomm chip and Windows 11 24H2. Today I can give you only Recall, taking screenshots and running a visual model to write context about what you are doing in the unencrypted Semantic Index database for embeddings. I'm giving you SLMs Phi Silica, accessible only via API and SDK. In the autumn I can give you the developer tools for C#/C++ and you can use them.
Apple: The best I can do is Apple Intelligence. You need a special Apple chip and macOS 15. Today I can give you only marketing. In the autumn I can give you on-device 3B quantized to 3.5bit mysterious SLMs and diffusion models with LoRA adapters. We will have an encrypted Semantic Index database for embeddings and agentic flows with function calling. We will call all of them with different names. In the autumn I will give you the developer tools in Swift and you can use them.
Open Source: The best I can do is llama.cpp. You can run it on any chip and OS. Today you can run AI inferencing on device and add other open source components for your solution. I can give you local AI models SLMs/LLMs - from wqen2-0.5B to Llama3-70B. You can have an encrypted local embeddings database with PostgreSQL/pgvector or SQLite-Vec. I can give you a wide choice of integrations and open-source components for your solution- from UIs to agentic workflows with function calling. Today I can give you the developer tools in Python/C/C++/Rust/Go/Node.js/JS/C#/Scala/Java and you can use them.
Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.