metadata

title: 🧠Deep🐍Research🌐Evaluator
emoji: 🧠🐍🌐
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: true
license: mit
short_description: Deep Research Evaluator for Long Horizon Learning Tasks

🎵', '🎶', '🎸', '🎹', '🎺', '🎷', '🥁', '🎻

Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.

Claude AI is built in.

Mixtral MOE 8B is the central model used alongside the arxiv embeddings RAG search:

Project Architecture

📂 Root Folder
- app.py (🤖 Streamlit App)
  - Main entry point for your Streamlit application.
- requirements.txt (📋 Dependencies)
  - Lists all the Python packages needed to run the app.
- 📂 mycomponent (🔧 HTML Component)
  - A subdirectory containing your custom Streamlit component code.
  - __init__.py (🐍 Python Init)
    - Tells Python this folder is a module/package.
  - index.html (🌐 Custom HTML)
    - Front-end HTML/JS/CSS for the custom component.

flowchart TB
    A[📂 Root Folder] --> B[app.py 🤖<br>(Streamlit App)]
    A --> C[requirements.txt 📋<br>(Dependencies)]
    A --> D[📂 mycomponent 🔧<br>(HTML Component)]
    D --> E[__init__.py 🐍<br>(Python Init)]
    D --> F[index.html 🌐<br>(Custom HTML)]

Usage Flow:

You run streamlit run app.py.
app.py imports mycomponent to load the HTML from index.html.
requirements.txt ensures needed dependencies are installed.
The __init__.py file ensures the custom component folder is recognized as a Python package.

Notes:

app.py hosts your Streamlit logic and references the mycomponent.
index.html supplies the interface for any front-end custom elements.
requirements.txt keeps the environment consistent.

Features 🎯 Core Configuration & Setup Configures the Streamlit page with title “🚲TalkingAIResearcher🏆”, sets layout, sidebar states, and environment variables.

🔑 API Setup & Clients Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.

📝 Session State Management Manages conversation history, transcripts, file editing states, and model selections.

🧠 get_high_info_terms() Extracts top words/bigrams from a text by counting frequency and filtering out stop words.

🏷️ clean_text_for_filename() Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.

📄 generate_filename() Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).

💾 create_file() Saves prompt + response content to a file, using generate_filename().

🔗 get_download_link() Generates base64-encoded download links for .md, audio, or zip files for inline downloading.

🎤 clean_for_speech() Strips out line breaks, URLs, and symbols to create more readable text for TTS.

🎙️ edge_tts_generate_audio() Asynchronously generates audio files (e.g., .mp3) using Edge TTS.

🔊 speak_with_edge_tts() A wrapper function for the async TTS call, allowing direct usage in synchronous code.

🎵 play_and_download_audio() Embeds an audio player in Streamlit and provides a download link for that audio file.

💿 save_qa_with_audio() Stores Q&A content in a markdown file and generates TTS audio for the question + answer.

📰 parse_arxiv_refs() Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.

🔗 create_paper_links_md() Builds a minimal markdown page with numbered links to each paper’s ArXiv URL.

📑 create_paper_audio_files() Processes each parsed paper, generating TTS audio and embedding base64 download links.

📚 display_papers() Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.

🗂 display_papers_in_sidebar() Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.

📂 display_file_history_in_sidebar() Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.

📦 create_zip_of_files() Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.

🔍 perform_ai_lookup() The main function to:

Query Anthropic (Claude) Call an ArXiv RAG pipeline Generate Q&A audio Parse and render the resulting papers 🎧 process_voice_input() Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.

🎬 main() Orchestrates the entire application flow:

Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor Shows file history in the sidebar Manages marquee settings and final UI layout