C4AI-Community (C4AI Community)

Tonic

posted an update 1 day ago

Post

440

Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg](https://huggingface.co/KRLabsOrg)
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.

singhsidhukuldeep

posted an update 4 days ago

Post

6646

Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!

singhsidhukuldeep

posted an update 6 days ago

Post

529

Exciting Research Alert: Enhancing Dense Retrieval with Deliberate Thinking

I just came across a fascinating new paper titled "Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search" that introduces DEBATER (Deliberate Thinking based Dense Retriever), a novel approach to improve information retrieval using large language models.

The research team from Northeastern University and Tsinghua University has developed a method that significantly outperforms existing dense retrieval systems by enabling LLMs to "think deliberately" before generating document representations.

>> Technical Details

DEBATER enhances LLM-based retrievers through two key mechanisms:

1. Chain-of-Deliberation (CoD): This approach delays the computation of document embeddings by performing several steps of reasoning. It incorporates a sequence of prompt tokens that stimulate the reasoning capability of LLMs, encouraging the model to think step-by-step before producing the final document embedding.

2. Self Distillation (SD): This mechanism distills knowledge from different thinking steps into the final document representation. It identifies the most informative thinking steps and integrates them into a unified text embedding.

The implementation uses cosine similarity to measure the similarity between queries and documents. During training, DEBATER calculates similarity scores between query representation and document representations at each thinking step, then selects the most useful thinking step from CoD.

>> Performance

What's particularly impressive is that DEBATER-4B outperforms larger 7B-scale LLM-based dense retrievers while using significantly fewer parameters. In experiments on the BEIR benchmark, DEBATER achieved more than a 2% improvement over baseline retrievers.

The researchers found that an appropriate thinking depth (around 4-8 steps) effectively activates the reasoning capabilities of LLM-based retrievers.

jjzha

authored a paper 6 days ago

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

Paper • 2502.15411 • Published 13 days ago • 2

singhsidhukuldeep

posted an update 8 days ago

Post

3444

O1 Embedder: Transforming Retrieval Models with Reasoning Capabilities

Researchers from University of Science and Technology of China and Beijing Academy of Artificial Intelligence have developed a novel retrieval model that mimics the slow-thinking capabilities of reasoning-focused LLMs like OpenAI's O1 and DeepSeek's R1.

Unlike traditional embedding models that directly match queries with documents, O1 Embedder first generates thoughtful reflections about the query before performing retrieval. This two-step process significantly improves performance on complex retrieval tasks, especially those requiring intensive reasoning or zero-shot generalization to new domains.

The technical implementation is fascinating:

- The model integrates two essential functions: Thinking and Embedding
- It uses an "Exploration-Refinement" data synthesis workflow where initial thoughts are generated by an LLM and refined by a retrieval committee
- A multi-task training method fine-tunes a pre-trained LLM to generate retrieval thoughts via behavior cloning while simultaneously learning embedding capabilities through contrastive learning
- Memory-efficient joint training enables both tasks to share encoding results, dramatically increasing batch size

The results are impressive - O1 Embedder outperforms existing methods across 12 datasets in both in-domain and out-of-domain scenarios. For example, it achieves a 3.9% improvement on Natural Questions and a 3.0% boost on HotPotQA compared to models without thinking capabilities.

This approach represents a significant paradigm shift in retrieval technology, bridging the gap between traditional dense retrieval and the reasoning capabilities of large language models.

What do you think about this approach? Could "thinking before retrieval" transform how we build search systems?

singhsidhukuldeep

posted an update 9 days ago

Post

1646

I just came across a groundbreaking paper titled "Hypencoder: Hypernetworks for Information Retrieval" by researchers from the University of Massachusetts Amherst that introduces a fundamentally new paradigm for search technology.

Most current retrieval models rely on simple inner product calculations between query and document vectors, which severely limits their expressiveness. The authors prove theoretically that inner product similarity functions fundamentally constrain what types of relevance relationships can be captured.

Hypencoder takes a radically different approach: instead of encoding a query as a vector, it generates a small neural network (called a "q-net") that acts as a learned relevance function. This neural network takes document representations as input and produces relevance scores.

Under the hood, Hypencoder uses:
- Attention-based hypernetwork layers (hyperhead layers) that transform contextualized query embeddings into weights and biases for the q-net
- A document encoder that produces vector representations similar to existing models
- A graph-based greedy search algorithm for efficient retrieval that can search 8.8M documents in under 60ms

The results are impressive - Hypencoder significantly outperforms strong dense retrieval models on standard benchmarks like MS MARCO and TREC Deep Learning Track. The performance gap widens even further on complex retrieval tasks like tip-of-the-tongue queries and instruction-following retrieval.

What makes this approach particularly powerful is that neural networks are universal approximators, allowing Hypencoder to express far more complex relevance relationships than inner product similarity functions. The framework is also flexible enough to replicate any existing neural retrieval method while adding the ability to learn query-dependent weights.

nathanaelc

published a dataset 10 days ago

C4AI-Community/memorycode

Updated 10 days ago • 21 • 2

nathanaelc

updated a dataset 10 days ago

C4AI-Community/memorycode

Updated 10 days ago • 21 • 2

ehristoforu

posted an update 10 days ago

Post

2645

Introducing our first standalone model – FluentlyLM Prinum

Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches and eventually found the optimal one.

General characteristics:
- Model type: Causal language models (QwenForCausalLM, LM Transformer)
- Number of parameters: 32.5B
- Number of parameters (not embedded): 31.0B
- Number of layers: 64
- Context: 131,072 tokens
- Language(s) (NLP): English, French, Spanish, Russian, Chinese, Japanese, Persian (officially supported)
- License: MIT

Creation strategy:
The basis of the strategy is shown in Pic. 2.
We used Axolotl & Unsloth for SFT-finetuning with PEFT LoRA (rank=64, alpha=64) and Mergekit for SLERP and TIES mergers.

Evolution:
🏆 12th place in the Open LLM Leaderboard ( open-llm-leaderboard/open_llm_leaderboard) (21.02.2025)

Detailed results and comparisons are presented in Pic. 3.

Links:
- Model: fluently-lm/FluentlyLM-Prinum
- GGUF version: mradermacher/FluentlyLM-Prinum-GGUF
- Demo on ZeroGPU: ehristoforu/FluentlyLM-Prinum-demo

7 replies

·

nathanaelc

authored 3 papers 10 days ago

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Paper • 2502.13791 • Published 15 days ago • 5

Self-Attention for Audio Super-Resolution

Paper • 2108.11637 • Published Aug 26, 2021

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

Paper • 2402.15268 • Published Feb 23, 2024

alielfilali01

posted an update 15 days ago

Post

732

🚨 Arabic LLM Evaluation 🚨

Few models join the ranking of inceptionai/AraGen-Leaderboard Today.

The new MistralAI model, Saba, is quite impressive, Top10 ! Well done @arthurmensch and team.

Sadly Mistral did not follow its strategy about public weights this time, we hope this changes soon and we get the model with a permissive license.

We added other Mistral models and apparently, we have been sleeping on mistralai/Mistral-Large-Instruct-2411 !

Another impressive model that joined the ranking today is ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !

ALLaM is ranked on OALL/Open-Arabic-LLM-Leaderboard as well.

jjzha

authored 3 papers 15 days ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published 16 days ago • 14

SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems

Paper • 2502.12927 • Published 16 days ago

On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation

Paper • 2502.12923 • Published 16 days ago

singhsidhukuldeep

posted an update 25 days ago

Post

4059

Fascinating deep dive into Swiggy's Hermes - their in-house Text-to-SQL solution that's revolutionizing data accessibility!

Hermes enables natural language querying within Slack, generating and executing SQL queries with an impressive <2 minute turnaround time. The system architecture is particularly intriguing:

Technical Implementation:
- Built on GPT-4 with a Knowledge Base + RAG approach for Swiggy-specific context
- AWS Lambda middleware handles communication between Slack UI and the Gen AI model
- Databricks jobs orchestrate query generation and execution

Under the Hood:
The pipeline employs a sophisticated multi-stage approach:
1. Metrics retrieval using embedding-based vector lookup
2. Table/column identification through metadata descriptions
3. Few-shot SQL retrieval with vector-based search
4. Structured prompt creation with data snapshots
5. Query validation with automated error correction

Architecture Highlights:
- Compartmentalized by business units (charters) for better context management
- Snowflake integration with seamless authentication
- Automated metadata onboarding with QA validation
- Real-time feedback collection via Slack

What's particularly impressive is how they've solved the data context challenge through charter-specific implementations, significantly improving query accuracy for well-defined metadata sets.

Kudos to the Swiggy team for democratizing data access across their organization. This is a brilliant example of practical AI implementation solving real business challenges.

eienmojiki

posted an update 27 days ago

Post

2084

🪄 LayerDiffuse - Flux Version (Demo) 🪄

LayerDiffuse - Transparent Image Layer Diffusion using Latent Transparency

Demo: https://huggingface.co/spaces/eienmojiki/Flux-LayerDiffuse

3 replies

·

singhsidhukuldeep

posted an update 28 days ago

Post

697

Exciting breakthrough in neural search technology!

Researchers from ETH Zurich, UC Berkeley, and Stanford University have introduced WARP - a groundbreaking retrieval engine that achieves remarkable performance improvements in multi-vector search.

WARP brings three major innovations to the table:
- A novel WARP SELECT algorithm for dynamic similarity estimation
- Implicit decompression during retrieval operations
- An optimized two-stage reduction process for efficient scoring

The results are stunning - WARP delivers a 41x reduction in query latency compared to existing XTR implementations, bringing response times down from 6+ seconds to just 171 milliseconds on single-threaded execution. It also achieves a 3x speedup over the current state-of-the-art ColBERTv2 PLAID engine while maintaining retrieval quality.

Under the hood, WARP uses highly-optimized C++ kernels and specialized inference runtimes. It employs an innovative compression strategy using k-means clustering and quantized residual vectors, reducing index sizes by 2-4x compared to baseline implementations.

The engine shows excellent scalability, with latency scaling with the square root of dataset size and effective parallelization across multiple CPU threads - achieving 3.1x speedup with 16 threads.

This work represents a significant step forward in making neural search more practical for production environments. The researchers have made the implementation publicly available for the community.

singhsidhukuldeep

posted an update 29 days ago

Post

1014

Exciting Research Alert: Remining Hard Negatives for Domain Adaptation in Dense Retrieval

Researchers from the University of Amsterdam have introduced R-GPL, an innovative approach to improve domain adaptation in dense retrievers. The technique enhances the existing GPL (Generative Pseudo Labeling) framework by continuously remining hard negatives during the training process.

Key Technical Insights:
- The method leverages domain-adapted models to mine higher quality hard negatives incrementally every 30,000 steps during training
- Uses MarginMSE loss for training with data triplets (Query, Relevant Doc, Hard Negative Doc)
- Implements mean pooling over hidden states for dense representations with 350 token sequence length
- Combines query generation with pseudo-labels from cross-encoder models

Performance Highlights:
- Outperforms baseline GPL in 13/14 BEIR datasets
- Shows significant improvements in 9/12 LoTTE datasets
- Achieves remarkable 4.4 point gain on TREC-COVID dataset

Under the Hood:
The system continuously refreshes hard negatives using the model undergoing domain adaptation. This creates a feedback loop where the model gets better at identifying relevant documents in the target domain, leading to higher quality training signals.

Analysis reveals that domain-adapted models retrieve documents with higher relevancy scores in top-100 hard negatives compared to baseline approaches. This confirms the model's enhanced capability to identify challenging but informative training examples.

This research opens new possibilities for efficient dense retrieval systems that can adapt to different domains without requiring labeled training data.

C4AI Community

AI & ML interests

Recent Activity

C4AI-Community's activity

HiFi-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

C4AI-Community/memorycode

C4AI-Community/memorycode

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Self-Attention for Audio Super-Resolution

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems

On-Device LLMs for Home Assistant: Dual Role in Intent Detection and Response Generation

AI & ML interests

Recent Activity

Team members 160

C4AI-Community's activity