Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
title: π§ DeepπResearchπEvaluator | |
emoji: π§ ππ | |
colorFrom: red | |
colorTo: purple | |
sdk: streamlit | |
sdk_version: 1.41.1 | |
app_file: app.py | |
pinned: true | |
license: mit | |
short_description: Deep Research Evaluator for Long Horizon Learning Tasks | |
# π΅', 'πΆ', 'πΈ', 'πΉ', 'πΊ', 'π·', 'π₯', 'π» | |
Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences. | |
Claude AI is built in. | |
Mixtral MOE 8B is the central model used alongside the arxiv embeddings RAG search: | |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/620630b603825909dcbeba35/742HW6RWYk35BK2g5Eq-T.png) | |
# Project Architecture | |
- π **Root Folder** | |
- **app.py** (π€ *Streamlit App*) | |
- Main entry point for your Streamlit application. | |
- **requirements.txt** (π *Dependencies*) | |
- Lists all the Python packages needed to run the app. | |
- π **mycomponent** (π§ *HTML Component*) | |
- A subdirectory containing your custom Streamlit component code. | |
- **\_\_init\_\_.py** (π *Python Init*) | |
- Tells Python this folder is a module/package. | |
- **index.html** (π *Custom HTML*) | |
- Front-end HTML/JS/CSS for the custom component. | |
```mermaid | |
flowchart TB | |
A[π Root Folder] --> B[app.py π€<br>(Streamlit App)] | |
A --> C[requirements.txt π<br>(Dependencies)] | |
A --> D[π mycomponent π§<br>(HTML Component)] | |
D --> E[__init__.py π<br>(Python Init)] | |
D --> F[index.html π<br>(Custom HTML)] | |
``` | |
--- | |
**Usage Flow**: | |
1. You run `streamlit run app.py`. | |
2. **app.py** imports **mycomponent** to load the HTML from **index.html**. | |
3. **requirements.txt** ensures needed dependencies are installed. | |
4. The **\_\_init\_\_.py** file ensures the custom component folder is recognized as a Python package. | |
**Notes**: | |
- **app.py** hosts your Streamlit logic and references the **mycomponent**. | |
- **index.html** supplies the interface for any front-end custom elements. | |
- **requirements.txt** keeps the environment consistent. | |
Features | |
π― Core Configuration & Setup | |
Configures the Streamlit page with title βπ²TalkingAIResearcherπβ, sets layout, sidebar states, and environment variables. | |
π API Setup & Clients | |
Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets. | |
π Session State Management | |
Manages conversation history, transcripts, file editing states, and model selections. | |
π§ get_high_info_terms() | |
Extracts top words/bigrams from a text by counting frequency and filtering out stop words. | |
π·οΈ clean_text_for_filename() | |
Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length. | |
π generate_filename() | |
Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates). | |
πΎ create_file() | |
Saves prompt + response content to a file, using generate_filename(). | |
π get_download_link() | |
Generates base64-encoded download links for .md, audio, or zip files for inline downloading. | |
π€ clean_for_speech() | |
Strips out line breaks, URLs, and symbols to create more readable text for TTS. | |
ποΈ edge_tts_generate_audio() | |
Asynchronously generates audio files (e.g., .mp3) using Edge TTS. | |
π speak_with_edge_tts() | |
A wrapper function for the async TTS call, allowing direct usage in synchronous code. | |
π΅ play_and_download_audio() | |
Embeds an audio player in Streamlit and provides a download link for that audio file. | |
πΏ save_qa_with_audio() | |
Stores Q&A content in a markdown file and generates TTS audio for the question + answer. | |
π° parse_arxiv_refs() | |
Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects. | |
π create_paper_links_md() | |
Builds a minimal markdown page with numbered links to each paperβs ArXiv URL. | |
π create_paper_audio_files() | |
Processes each parsed paper, generating TTS audio and embedding base64 download links. | |
π display_papers() | |
Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio. | |
π display_papers_in_sidebar() | |
Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio. | |
π display_file_history_in_sidebar() | |
Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them. | |
π¦ create_zip_of_files() | |
Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename. | |
π perform_ai_lookup() | |
The main function to: | |
Query Anthropic (Claude) | |
Call an ArXiv RAG pipeline | |
Generate Q&A audio | |
Parse and render the resulting papers | |
π§ process_voice_input() | |
Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file. | |
π¬ main() | |
Orchestrates the entire application flow: | |
Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor | |
Shows file history in the sidebar | |
Manages marquee settings and final UI layout | |