✨Apache 2.0 ✨8.19GB VRAM, runs on most GPUs ✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A ✨Text Generation: Supports Chinese & English ✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision
reacted to fdaudens's
post with ❤️about 5 hours ago
🚀 Just launched: A toolkit of 20 powerful AI tools that journalists can use right now - transcribe, analyze, create. 100% free & open-source.
Been testing all these tools myself and created a searchable collection of the most practical ones - from audio transcription to image generation to document analysis. No coding needed, no expensive subscriptions.
Some highlights I've tested personally: - Private, on-device transcription with speaker ID in 100+ languages using Whisper - Website scraping that just works - paste a URL, get structured data - Local image editing with tools like Finegrain (impressive results) - Document chat using Qwen 2.5 72B (handles technical papers well)
Sharing this early because the best tools come from the community. Drop your favorite tools in the comments or join the discussion on what to add next!
Huge disappointment to Claude Sonnet 3.7 😞 Big performance regression. Worse than the June version in 2024. 👎 onekq-ai/WebApp1K-models-leaderboard
I'm sure though this version improves on something, only not the thing my leaderboard measures. This proves the point that no model can be the best on everything.
1 reply
·
reacted to sequelbox's
post with 🚀about 5 hours ago
SNEAK PREVIEW: Tachibana 2! A new high-difficulty code-reasoning dataset to use and challenge deepseek-ai/DeepSeek-R1 - harder prompts, complex requirements, deeper technical skill.
I was just playing around with Python's MIDI library and Colab's code generation, accidentally cooked up a quick n' dirty audio synthesis template. Have fun!
📢 Old Research Alert: Making Computer Vision Models Smaller & Smarter!
Years ago, I coded an optimization in the first layers of a convolutional neural network (computer vision) and ended never posting here. The optimization decreases the number of parameters while increasing accuracy. The optimization relies in separating (branching) chromatic and achromatic information through the layers of a neural network.
Following the 1.0 collection, I release the 1.1 version with an updated dataset for sentence similarity as well as a raw dataset from central bankers speeches.
The newest model is econo-sentence-v2 is a new version of a sentence-transformers model based on EconoBert ! It gets better results with a nuance on similarity.
If you're an economist looking for useful tools, don't hesitate to check it out !
reacted to Jiaqi-hkust's
post with 🚀about 5 hours ago
We have open-sourced Hawk (NeurIPS 2024) 🎉, one of the pioneering frameworks for open-world video anomaly understanding.
In the field of video anomaly detection, despite continuous technological advancements, existing systems still face limitations in semantic understanding of scenes and user interaction, making it challenging to effectively identify complex anomalous scenes. Additionally, the scarcity of datasets restricts the applicability of these systems in open-world scenarios.
To tackle these challenges, we developed Hawk, an open-world video understanding and anomaly detection framework. Hawk significantly enhances anomaly recognition by identifying motion information differences between anomalous and normal videos. We introduce an auxiliary consistency loss to strengthen the focus on motion modalities and establish a supervisory relationship between motion and language representations. Furthermore, we have annotated over 8,000 anomalous videos and their language descriptions and created 8,000 question-answer pairs to support effective training in diverse open-world scenarios.
Experimental results demonstrate that Hawk surpasses existing video understanding frameworks in video description generation and question-answering tasks.
Transform Images into Professional Vector Graphics Convert your raster images (JPG, PNG, WEBP) into high-quality vector graphics (SVG) with our easy-to-use tool! Perfect for designers, artists, and anyone needing vector conversions.
🎯 Key Features:
Convert to scalable SVG vector graphics Real-time preview of your SVG output Advanced customization options Clean, user-friendly interface Batch processing ready
📢 If you're interesting in quick application of target sentiment analysis towards your data, you might be insterested in using fine-tuned FlanT5-xl version. Reason is a quick performance: I've added batching support for series of sentiment analysis models in this card: nicolay-r/sentiment-analysis-advances-665ba391e0eba729021ea101
Reason for using? experimenting in out-of domain, the noticed the performance of xl version similar to LLaMA-3-3b-instruct.
🔑 Key takeaways of adaptaiont: - paddings and truncation strategies for batching mode: - https://huggingface.co/docs/transformers/en/pad_truncation - add_special_tokens=False causes a drastic changes in the result behaviour (FlanT5 models). 💥 Crashes on pad_token_id=50256 during generation proces. 🔻 use_bf16 mode performs 3 times slower on CPU.
🚀 Performance for BASE sized model: nicolay-r/flan-t5-tsa-thor-base 17.2 it/s (prompt) and 5.22 it/s (3-step CoT) (CPU Core i5-1140G7)
I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:
A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.
I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.
Welcome to Datasets Convertor, the cutting-edge solution engineered for seamless and efficient data format conversion. Designed with both data professionals and enthusiasts in mind, our tool simplifies the transformation process between CSV, Parquet, and JSONL, XLS file formats, ensuring that your data is always in the right shape for your next analytical or development challenge. 💻✨
Why Choose Datasets Convertor? In today’s data-driven world, managing and converting large datasets can be a daunting task. Our converter is built on top of robust technologies like Pandas and Gradio, delivering reliable performance with a modern, intuitive interface. Whether you’re a data scientist, analyst, or developer, Datasets Convertor empowers you to effortlessly switch between formats while maintaining data integrity and optimizing storage.
Key Features and Capabilities: CSV ⇆ Parquet Conversion: Easily transform your CSV files into the highly efficient Parquet format and vice versa. Parquet’s columnar storage not only reduces file size but also accelerates query performance—a critical advantage for big data analytics. 🔄📂
CSV to JSONL Conversion: Convert CSV files to JSONL (newline-delimited JSON) to facilitate efficient, line-by-line data processing. This format is particularly useful for streaming data applications, logging systems, and scenarios where incremental data processing is required. Each CSV row is meticulously converted into an individual JSON record, preserving all the metadata and ensuring compatibility with modern data pipelines. 📄➡️📝
Parquet to JSONL Conversion: For those working with Parquet files, our tool offers a streamlined conversion to JSONL.
I've got my hands on an AMD Instinct MI100. It's about the same price used as a V100 but on paper has more TOPS (V100 14TOPS vs MI100 23TOPS) also the HBM has faster clock so the memory bandwidth is 1.2TB/s. For quantized inference it's a beast (MI50 was also surprisingly fast)
For LORA training with this quick test I could not make the bnb config works so I'm running the FT on the fill size model.
Will share all the install, setup and setting I've learned in a blog post, together with the cooling shroud 3D design.
✨ TODAY: DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences. https://github.com/deepseek-ai/FlashMLA
Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs. moonshotai/Moonlight-16B-A3B
We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones 🔥
Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject.
To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them.
🎯 For the preparation part, a key part is find all the important references on the given subject. Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an “AttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful!
📝 For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones.
As a result, their system outperforms previous approaches by far!
As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 🏆