4 9

Alibaba_Speech_Lab_SG PRO

alibabasglab

alibabasglab

AI & ML interests

speech enhancement, separation, and codec

Recent Activity

liked a Space 3 days ago

FunAudioLLM/InspireMusic

posted an update about 1 month ago

Do you need to improve your speech audio to premium quality? If so, please try out our latest open-sourced free speech processing toolkit: [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio)! Check out our live demo at https://huggingface.co/spaces/alibabasglab/ClearVoice and https://modelscope.cn/studios/iic/ClearerVoice-Studio.

posted an update about 1 month ago

ClearerVoice-Studio: your one-step speech processing platform for speech enhancement, speech separation, speech super-resolution, and audio-visual target speaker extraction. Say goodbye to noise and hello to clarity! Online demo: https://huggingface.co/spaces/alibabasglab/ClearVoice . Github repo: https://github.com/modelscope/ClearerVoice-Studio

View all activity

Organizations

alibabasglab's activity

liked a Space 3 days ago

InspireMusic

🎶

Music Generation - text to music, music continuation.

posted an update about 1 month ago

Post

2285

Do you need to improve your speech audio to premium quality? If so, please try out our latest open-sourced free speech processing toolkit: [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio)! Check out our live demo at alibabasglab/ClearVoice
and https://modelscope.cn/studios/iic/ClearerVoice-Studio.

1 reply

posted an update about 1 month ago

Post

2061

ClearerVoice-Studio: your one-step speech processing platform for speech enhancement, speech separation, speech super-resolution, and audio-visual target speaker extraction. Say goodbye to noise and hello to clarity!

Online demo: alibabasglab/ClearVoice .
Github repo: https://github.com/modelscope/ClearerVoice-Studio

New activity in alibabasglab/MossFormer2_SR_48K about 1 month ago

Update README.md

#1 opened about 2 months ago by

Pur1zumu

New activity in alibabasglab/LJSpeech-1.1-48kHz about 1 month ago

Add task category, link to paper

#2 opened about 1 month ago by

nielsr

updated a model about 1 month ago

alibabasglab/MossFormer2_SR_48K

Updated Jan 21 • 3

updated a dataset about 1 month ago

alibabasglab/LJSpeech-1.1-48kHz

Viewer • Updated Jan 21 • 20 • 193 • 2

updated a Space about 1 month ago

193

ClearerVoice-Studio (Speech Enhancement, Separation and Extraction)

📈

Better AI powered platform to purify your speech signal

reacted to their post with 🤗❤️🚀🔥 about 2 months ago

Post

1703

Do you need to improve your speech audio to premium quality? If so, please try out our latest open-sourced free speech processing toolkit: [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio)! Check out our live demo at alibabasglab/ClearVoice
and https://modelscope.cn/studios/iic/ClearerVoice-Studio.

posted an update about 2 months ago

Post

1703

Do you need to improve your speech audio to premium quality? If so, please try out our latest open-sourced free speech processing toolkit: [ClearerVoice-Studio](https://github.com/modelscope/ClearerVoice-Studio)! Check out our live demo at alibabasglab/ClearVoice
and https://modelscope.cn/studios/iic/ClearerVoice-Studio.

reacted to their post with 🤝❤️🔥 about 2 months ago

Post

1223

Introducing open-sourced ClearerVoice-Studio. A powerful speech processing AI tool to dramatically improve your speech quality. Checkout demo page: alibabasglab/ClearVoice and https://modelscope.cn/studios/iic/ClearerVoice-Studio. Give us a Star on Github: https://github.com/modelscope/ClearerVoice-Studio!

reacted to prithivMLmods's post with 🔥 about 2 months ago

Post

3144

ChemQwen-vL [ Qwen for Chem Vision ] 🧑🏻‍🔬

🧪Model : prithivMLmods/ChemQwen-vL

📝ChemQwen-vL is a vision-language model fine-tuned based on the Qwen2VL-2B Instruct model. It has been trained using the International Chemical Identifier (InChI) format for chemical compounds and is optimized for chemical compound identification. The model excels at generating the InChI and providing descriptions of chemical compounds based on their images. Its architecture operates within a multi-modal framework, combining image-text-text capabilities. It has been fine-tuned using datasets from: https://iupac.org/projects/

📒Colab Demo: https://tinyurl.com/2pn8x6u7, Collection : https://tinyurl.com/2mt5bjju

Inference with the documentation is possible with the help of the ReportLab library. https://pypi.org/project/reportlab/

🤗: @prithivMLmods

1 reply

replied to prithivMLmods's post about 2 months ago

Nice work!

reacted to m-ric's post with 👀 about 2 months ago

Post

1373

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01

reacted to Tonic's post with 🔥 about 2 months ago

Post

1881

🙋🏻‍♂️ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it