Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (πŸ–Ό+πŸ“) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

reacted to KnutJaegersberg's post with πŸ‘€ about 4 hours ago
A Brief Survey of Associations Between Meta-Learning and General AI The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized: 1. General AI (AGI) and Meta-Learning: - AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks. - Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences. 2. Neural Network Design in Meta-Learning: - Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks. - Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks. 3. Coevolution: - Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance. - Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks. 4. Curiosity in Meta-Learning: - Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima. - Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks. 5. Forgetting Mechanisms: - Forgetting is crucial to avoid memory overload in AI systems https://arxiv.org/abs/2101.04283
View all activity

Organizations

None yet

nicolay-r's activity

posted an update about 3 hours ago
view post
Post
114
πŸ“’ If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py

What do you need to know about Spacy NER models:
β˜‘οΈ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
β˜‘οΈ Library has a pipeline for optimized request handling in batches.
β˜‘οΈ Architecture: DNN embedding-based models (not transformers)

πŸ€– List of models (or see screenshot below):
https://huggingface.co/spacy
πŸ“‹ Supported NER types:
https://github.com/explosion/spaCy/discussions/9147

⚠️ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism

πŸš€ Performance for sentences (en):
Model: spacy/en_core_web_sm πŸ”₯ 530 sentences per second πŸ”₯ (similar to larger solutions)

🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner
reacted to KnutJaegersberg's post with πŸ‘€ about 4 hours ago
view post
Post
621
A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283
reacted to Duskfallcrew's post with πŸ”₯ about 4 hours ago
view post
Post
925
Just been starting to port my articles over that mattered most to me from Civitai.
Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me.
I'm a mere spec compared to their future, and that's amazing.
But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI.
I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.
reacted to Kseniase's post with πŸ”₯ about 4 hours ago
view post
Post
379
8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
EnhancesΒ  novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG ->Β  CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots
posted an update 1 day ago
view post
Post
840
πŸ“’ If you wish to empower LLM with IR and named entity recognition module, then I got relevant findings.
Just tested Flair below is how you can start for adapting for processing your CSV / JSONL data via bulk-ner
πŸ‘©β€πŸ’» code: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_flair_0151.sh
πŸ€– models: https://huggingface.co/flair

Provider: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/flair_0151.py
Framework: https://github.com/nicolay-r/bulk-ner

πŸš€ Performance: the default ner model (Thinkpad X1 Nano)
Batch-size 1 6it/sec
Batch-size 10+ 12it/sec

🌌 other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate
posted an update 2 days ago
view post
Post
1044
πŸ“’ Who would like to embed NER into LLM pipeline, just made an example of the pretrained multilingual BERT via DeepPavlov framework via bulk-ner:
πŸ“” : https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_deeppavlov_130.ipynb

Note: expected 3.9-3.10 Python. Accelerate in Python 3.11 may require further tweaks for launching. Might try out to wrap other frameworks later on here↗️: https://github.com/nicolay-r/nlp-thirdgate

The new release bulk-ner 0.25.1 in which the following updates were made:
βœ… Removing sentnce index from output #21
βœ… API + support function for custom entities construction
βœ… hub for providers

🌟 bulk-ner: https://github.com/nicolay-r/bulk-ner
reacted to IliaLarchenko's post with πŸ”₯ 3 days ago
view post
Post
1930
I am presenting Decoder-Only Transformer (DOT) Policy a simple Behavioral Control policy that outperforms SOTA models on two simple benchmark tasks:

βœ… PushT (pushing an object to a goal) – 84% success on keypoints, 74% on images (previous best: 75% / 69%)
βœ… ALOHA Insert (precise bimanual insertion) – 30% success (previous best: ~21%)

The best part? DOT is much smaller (sometimes 100 times less parameters) than previous SOTA models, trains faster, and avoids complexity:
🚫 No generative models (Diffusion, VAE, GANs)
🚫 No discretization/tokenization of actions
🚫 No reinforcement learning or multi-stage training
βœ… Just learns from human demos, plain and simple

This is still early β€” more complex real-life tasks need testing, and no guarantees it will actually work well there, but I think it's interesting to share. Sometimes, simpler approaches can be just as effective (or even better) than complex ones.

πŸ”— Open-source code and detailed description: https://github.com/IliaLarchenko/dot_policy

Trained models on Hugging Face:
IliaLarchenko/dot_pusht_keypoints
IliaLarchenko/dot_pusht_images
IliaLarchenko/dot_bimanual_insert
reacted to fdaudens's post with πŸ€— 3 days ago
reacted to retronic's post with πŸ”₯ 3 days ago
view post
Post
4062
Colox, a reasoning AI model. I am currently working on a model smarter than GPT o1 that thinks before it speaks. It is coming tomorrow in the afternoon.
Β·
posted an update 3 days ago
view post
Post
1998
🚨 Key takeaway of a quick mastering Sentiment Analysis nowadays. Trough the questionare πŸ“ of the past RuOpinoinNE-2024 competition we got insights and participants model preference chocies. Our main conclusion:

✨ The submissions of the top performed models exploit Few-shot learning for LLM.

Takeaway note comparing with the prior RuSentNE-2023 competition:
🧠 Reasoning in steps requires more actions for tweaking. Most recent solutions empowered with Chain-of-Thouhgt are tend to think too much. Earlier we might see improvements for the Flan-T5 (2.8B) in fine-tuned mode but not among the zero-shot approaches.
nicolay-r/flan-t5-tsa-thor-xl

Related materials:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts (2305.17679)
Large Language Models in Targeted Sentiment Analysis (2404.12342)
reacted to ggbetz's post with πŸ‘€ 5 days ago
view post
Post
1774
We've just released syncIALO -- a multi-purpose synthetic debate and argument mapping corpus with more than 600k arguments:

πŸ“ Blog article: https://huggingface.co/blog/ggbetz/introducing-syncialo
πŸ›’οΈ Dataset: DebateLabKIT/syncialo-raw
πŸ‘©β€πŸ’» Code: https://github.com/debatelab/syncIALO

πŸ€— Hugging Face has sponsored the syncIALO project through inference time / compute credits. πŸ™ We gratefully acknowledge the generous support. 🫢
replied to their post 5 days ago
view reply

@claudiohgdotta , thanks edited!
That would be too much from the Qwen-2.5-MAX.
Especially counting on how fast the demo inference of the Qwen.

posted an update 6 days ago
view post
Post
2183
πŸ“’ Qwen so far released the 2.5-MAX that claims to outperform DeepSeek-V3 [Edited: not R1].
And here is how you can start applying it for handling CSV / JSONL data.
The model is compatible with OpenAI API so here is my wrapper for it:
🌌 https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/openai_156.py

πŸš€ All you have to do is to set
base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
and API key of the platform.

↗️ Below is the link to the complete example (see screenshot):
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_qwen_25_max_chat.sh

πŸ“° Source: https://www.alibabacloud.com/help/en/model-studio/developer-reference/what-is-qwen-llm
πŸ“Ί Official Sandbox Demo: Qwen/Qwen2.5-Max-Demo
πŸ“œ Paper: https://arxiv.org/abs/2412.15115
  • 2 replies
Β·
reacted to singhsidhukuldeep's post with πŸš€ 6 days ago
view post
Post
3540
Exciting Research Alert: Revolutionizing Complex Information Retrieval!

A groundbreaking paper from researchers at MIT, AWS AI, and UPenn introduces ARM (Alignment-Oriented LLM-based Retrieval Method), a novel approach to tackle complex information retrieval challenges.

>> Key Innovations

Information Alignment
The method first decomposes queries into keywords and aligns them with available data using both BM25 and embedding similarity, ensuring comprehensive coverage of information needs.

Structure Alignment
ARM employs a sophisticated mixed-integer programming solver to identify connections between data objects, exploring relationships beyond simple semantic matching.

Self-Verification
The system includes a unique self-verification mechanism where the LLM evaluates and aggregates results from multiple retrieval paths, ensuring accuracy and completeness.

>> Performance Highlights

The results are impressive:
- Outperforms standard RAG by up to 5.2 points in execution accuracy on Bird dataset
- Achieves 19.3 points higher F1 scores compared to existing approaches on OTT-QA
- Reduces the number of required LLM calls while maintaining superior retrieval quality

>> Technical Implementation

The system uses a three-step process:
1. N-gram indexing and embedding computation for all data objects
2. Constrained beam decoding for information alignment
3. Mixed-integer programming optimization for structure exploration

This research represents a significant step forward in making complex information retrieval more efficient and accurate. The team's work demonstrates how combining traditional optimization techniques with modern LLM capabilities can solve challenging retrieval problems.
reacted to JingzeShi's post with πŸ€— 6 days ago
view post
Post
2198
Welcome to the Doge Face Open Source Community! πŸš€
Our goal is to explore the foundation of embodied intelligence for the next two years, which is indispensable – small language models. πŸ”¬
We aim to open-source code and documentation to give everyone more time to slack off while working or studying! πŸ€—
πŸ‘‰ Repository name on Github: https://github.com/SmallDoges/small-doge
πŸ‘‰ Organization name on Hugging Face: https://huggingface.co/SmallDoge
reacted to csabakecskemeti's post with πŸ‘€ 6 days ago
view post
Post
1783
Check out my idea:
LLmaaS - Local LLM as a Service

With LLmaaS, I propose leveraging locally running LLMs as a service, providing a standardized way for websites to access and utilize them for LLM-powered operations directly on the user’s device.

Demo, code, more detailed description.
https://devquasar.com/llmaas/
https://github.com/csabakecskemeti/LLmaaS
https://youtu.be/OOWGr8jcP5Q

Call for contributors
Join me a develop the LLmaaS proxy to make this a generic purpose tool to leverage local LLMs on web. Build in security measures.
I'm looking for help to make the proxy more generic support multiple local LLM services without any change on the HTML side.
Also looking for ideas how to make the HTML par more modular and easy to use.
  • 4 replies
Β·
reacted to fdaudens's post with πŸ‘€ 6 days ago
view post
Post
2333
πŸ“Š R1 just built its own download dashboard!

Some fresh stats: +6M downloads for 800+ derivative models vs 2M for originals. Watch the numbers grow here: fdaudens/deepseek-download-stats
reacted to Pendrokar's post with ❀️ 6 days ago
view post
Post
2931
TTS: Added Kokoro v1, Parler Large, LlaSa 3B & MARS 6 TTS models to the Arena.
Pendrokar/TTS-Spaces-Arena

Also had added MaskGCT, GPT-SoVITS & OuteTTS a month ago. OuteTTS devs did say that is too early for it to be added to TTS Arenas.

Mars 5 does have a space with open weights models, but inference is way too slow (2 minutes+).
  • 2 replies
Β·