Nicolay Rusnachenko

nicolay-r

https://nicolay-r.github.io/

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

posted an update about 3 hours ago

📢 If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content: Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py What do you need to know about Spacy NER models: ☑️ Models represent a python packages; packages could be installed directly into environemnt or via python CLI. ☑️ Library has a pipeline for optimized request handling in batches. ☑️ Architecture: DNN embedding-based models (not transformers) 🤖 List of models (or see screenshot below): https://huggingface.co/spacy 📋 Supported NER types: https://github.com/explosion/spaCy/discussions/9147 ⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism 🚀 Performance for sentences (en): Model: https://huggingface.co/spacy/en_core_web_sm 🔥 530 sentences per second 🔥 (similar to larger solutions) 🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner

reacted to KnutJaegersberg's post with 👀 about 4 hours ago

A Brief Survey of Associations Between Meta-Learning and General AI The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized: 1. General AI (AGI) and Meta-Learning: - AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks. - Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences. 2. Neural Network Design in Meta-Learning: - Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks. - Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks. 3. Coevolution: - Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance. - Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks. 4. Curiosity in Meta-Learning: - Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima. - Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks. 5. Forgetting Mechanisms: - Forgetting is crucial to avoid memory overload in AI systems https://arxiv.org/abs/2101.04283

reacted to Duskfallcrew's post with 🔥 about 4 hours ago

Just been starting to port my articles over that mattered most to me from Civitai. Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me. I'm a mere spec compared to their future, and that's amazing. But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI. I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.

View all activity

Organizations

None yet

nicolay-r's activity

posted an update about 3 hours ago

Post

114

📢 If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py

What do you need to know about Spacy NER models:
☑️ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
☑️ Library has a pipeline for optimized request handling in batches.
☑️ Architecture: DNN embedding-based models (not transformers)

🤖 List of models (or see screenshot below):
https://huggingface.co/spacy
📋 Supported NER types:
https://github.com/explosion/spaCy/discussions/9147

⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism

🚀 Performance for sentences (en):
Model: spacy/en_core_web_sm 🔥 530 sentences per second 🔥 (similar to larger solutions)

🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner

reacted to KnutJaegersberg's post with 👀 about 4 hours ago

Post

621

A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283

reacted to Duskfallcrew's post with 🔥 about 4 hours ago

Post

925

Just been starting to port my articles over that mattered most to me from Civitai.
Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me.
I'm a mere spec compared to their future, and that's amazing.
But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI.
I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.

reacted to Kseniase's post with 🔥 about 4 hours ago

Post

379

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

posted an update 1 day ago

Post

840

📢 If you wish to empower LLM with IR and named entity recognition module, then I got relevant findings.
Just tested Flair below is how you can start for adapting for processing your CSV / JSONL data via bulk-ner
👩‍💻 code: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_flair_0151.sh
🤖 models: https://huggingface.co/flair

Provider: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/flair_0151.py
Framework: https://github.com/nicolay-r/bulk-ner

🚀 Performance: the default ner model (Thinkpad X1 Nano)
Batch-size 1 6it/sec
Batch-size 10+ 12it/sec

🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate

posted an update 2 days ago

Post

1044

📢 Who would like to embed NER into LLM pipeline, just made an example of the pretrained multilingual BERT via DeepPavlov framework via bulk-ner:
📔 : https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_deeppavlov_130.ipynb

Note: expected 3.9-3.10 Python. Accelerate in Python 3.11 may require further tweaks for launching. Might try out to wrap other frameworks later on here↗️: https://github.com/nicolay-r/nlp-thirdgate

The new release bulk-ner 0.25.1 in which the following updates were made:
✅ Removing sentnce index from output #21
✅ API + support function for custom entities construction
✅ hub for providers

🌟 bulk-ner: https://github.com/nicolay-r/bulk-ner

reacted to IliaLarchenko's post with 🔥 3 days ago

Post

1930

I am presenting Decoder-Only Transformer (DOT) Policy a simple Behavioral Control policy that outperforms SOTA models on two simple benchmark tasks:

✅ PushT (pushing an object to a goal) – 84% success on keypoints, 74% on images (previous best: 75% / 69%)
✅ ALOHA Insert (precise bimanual insertion) – 30% success (previous best: ~21%)

The best part? DOT is much smaller (sometimes 100 times less parameters) than previous SOTA models, trains faster, and avoids complexity:
🚫 No generative models (Diffusion, VAE, GANs)
🚫 No discretization/tokenization of actions
🚫 No reinforcement learning or multi-stage training
✅ Just learns from human demos, plain and simple

This is still early — more complex real-life tasks need testing, and no guarantees it will actually work well there, but I think it's interesting to share. Sometimes, simpler approaches can be just as effective (or even better) than complex ones.

🔗 Open-source code and detailed description: https://github.com/IliaLarchenko/dot_policy

Trained models on Hugging Face:
IliaLarchenko/dot_pusht_keypoints
IliaLarchenko/dot_pusht_images
IliaLarchenko/dot_bimanual_insert

reacted to fdaudens's post with 🤗 3 days ago

Post

1875

📢 SmolLM2 paper released! Learn how the 🤗 team built one of the best small language models: from data choices to training insights. Check out our findings and share your thoughts! 🤏💡

Check it out: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)

reacted to retronic's post with 🔥 3 days ago

Post

4062

Colox, a reasoning AI model. I am currently working on a model smarter than GPT o1 that thinks before it speaks. It is coming tomorrow in the afternoon.

7 replies

posted an update 3 days ago

Post

1998

🚨 Key takeaway of a quick mastering Sentiment Analysis nowadays. Trough the questionare 📝 of the past RuOpinoinNE-2024 competition we got insights and participants model preference chocies. Our main conclusion:

✨ The submissions of the top performed models exploit Few-shot learning for LLM.

Takeaway note comparing with the prior RuSentNE-2023 competition:
🧠 Reasoning in steps requires more actions for tweaking. Most recent solutions empowered with Chain-of-Thouhgt are tend to think too much. Earlier we might see improvements for the Flan-T5 (2.8B) in fine-tuned mode but not among the zero-shot approaches.
nicolay-r/flan-t5-tsa-thor-xl

Related materials:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts (2305.17679)
Large Language Models in Targeted Sentiment Analysis (2404.12342)

upvoted a paper 3 days ago

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

Paper • 2305.11255 • Published May 18, 2023 • 1

liked a dataset 5 days ago

open-thoughts/OpenThoughts-114k

Viewer • Updated 1 day ago • 114k • 39.4k • 354

reacted to ggbetz's post with 👀 5 days ago

Post

1774

We've just released syncIALO -- a multi-purpose synthetic debate and argument mapping corpus with more than 600k arguments:

📝 Blog article: https://huggingface.co/blog/ggbetz/introducing-syncialo
🛢️ Dataset: DebateLabKIT/syncialo-raw
👩‍💻 Code: https://github.com/debatelab/syncIALO

🤗 Hugging Face has sponsored the syncIALO project through inference time / compute credits. 🙏 We gratefully acknowledge the generous support. 🫶

replied to their post 5 days ago

@claudiohgdotta , thanks edited!
That would be too much from the Qwen-2.5-MAX.
Especially counting on how fast the demo inference of the Qwen.

posted an update 6 days ago

Post

2183

📢 Qwen so far released the 2.5-MAX that claims to outperform DeepSeek-V3 [Edited: not R1].
And here is how you can start applying it for handling CSV / JSONL data.
The model is compatible with OpenAI API so here is my wrapper for it:
🌌 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/openai_156.py

🚀 All you have to do is to set
base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
and API key of the platform.

↗️ Below is the link to the complete example (see screenshot):
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_qwen_25_max_chat.sh

📰 Source: https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm
📺 Official Sandbox Demo: Qwen/Qwen2.5-Max-Demo
📜 Paper: https://arxiv.org/abs/2412.15115

2 replies

reacted to singhsidhukuldeep's post with 🚀 6 days ago

Post

3540

Exciting Research Alert: Revolutionizing Complex Information Retrieval!

A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.

>> Key Innovations

Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.

Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.

Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.

>> Performance Highlights

The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality

>> Technical Implementation

The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration

This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.

reacted to JingzeShi's post with 🤗 6 days ago

Post

2198

Welcome to the Doge Face Open Source Community! 🚀
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable – small language models. 🔬
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! 🤗
👉 Repository name on Github: https://github.com/SmallDoges/small-doge
👉 Organization name on Hugging Face: https://huggingface.co/SmallDoge

reacted to csabakecskemeti's post with 👀 6 days ago

Post

1783

Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.

4 replies

reacted to fdaudens's post with 👀 6 days ago

Post

2333

📊 R1 just built its own download dashboard!

Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats

reacted to Pendrokar's post with ❤️ 6 days ago

Post

2931

TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).

2 replies