awacke1's picture
Update README.md
b6fb714 verified
metadata
title: ๐Ÿง Deep๐ŸResearch๐ŸŒEvaluator
emoji: ๐Ÿง ๐Ÿ๐ŸŒ
colorFrom: red
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: true
license: mit
short_description: Deep Research Evaluator for Long Horizon Learning Tasks

๐ŸŽต', '๐ŸŽถ', '๐ŸŽธ', '๐ŸŽน', '๐ŸŽบ', '๐ŸŽท', '๐Ÿฅ', '๐ŸŽป

Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.

Claude AI is built in.

Mixtral MOE 8B is the central model used alongside the arxiv embeddings RAG search:

image/png

Project Architecture

  • ๐Ÿ“‚ Root Folder
    • app.py (๐Ÿค– Streamlit App)
      • Main entry point for your Streamlit application.
    • requirements.txt (๐Ÿ“‹ Dependencies)
      • Lists all the Python packages needed to run the app.
    • ๐Ÿ“‚ mycomponent (๐Ÿ”ง HTML Component)
      • A subdirectory containing your custom Streamlit component code.
      • __init__.py (๐Ÿ Python Init)
        • Tells Python this folder is a module/package.
      • index.html (๐ŸŒ Custom HTML)
        • Front-end HTML/JS/CSS for the custom component.
flowchart TB
    A[๐Ÿ“‚ Root Folder] --> B[app.py ๐Ÿค–<br>(Streamlit App)]
    A --> C[requirements.txt ๐Ÿ“‹<br>(Dependencies)]
    A --> D[๐Ÿ“‚ mycomponent ๐Ÿ”ง<br>(HTML Component)]
    D --> E[__init__.py ๐Ÿ<br>(Python Init)]
    D --> F[index.html ๐ŸŒ<br>(Custom HTML)]

Usage Flow:

  1. You run streamlit run app.py.
  2. app.py imports mycomponent to load the HTML from index.html.
  3. requirements.txt ensures needed dependencies are installed.
  4. The __init__.py file ensures the custom component folder is recognized as a Python package.

Notes:

  • app.py hosts your Streamlit logic and references the mycomponent.
  • index.html supplies the interface for any front-end custom elements.
  • requirements.txt keeps the environment consistent.

Features ๐ŸŽฏ Core Configuration & Setup Configures the Streamlit page with title โ€œ๐ŸšฒTalkingAIResearcher๐Ÿ†โ€, sets layout, sidebar states, and environment variables.

๐Ÿ”‘ API Setup & Clients Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.

๐Ÿ“ Session State Management Manages conversation history, transcripts, file editing states, and model selections.

๐Ÿง  get_high_info_terms() Extracts top words/bigrams from a text by counting frequency and filtering out stop words.

๐Ÿท๏ธ clean_text_for_filename() Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.

๐Ÿ“„ generate_filename() Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).

๐Ÿ’พ create_file() Saves prompt + response content to a file, using generate_filename().

๐Ÿ”— get_download_link() Generates base64-encoded download links for .md, audio, or zip files for inline downloading.

๐ŸŽค clean_for_speech() Strips out line breaks, URLs, and symbols to create more readable text for TTS.

๐ŸŽ™๏ธ edge_tts_generate_audio() Asynchronously generates audio files (e.g., .mp3) using Edge TTS.

๐Ÿ”Š speak_with_edge_tts() A wrapper function for the async TTS call, allowing direct usage in synchronous code.

๐ŸŽต play_and_download_audio() Embeds an audio player in Streamlit and provides a download link for that audio file.

๐Ÿ’ฟ save_qa_with_audio() Stores Q&A content in a markdown file and generates TTS audio for the question + answer.

๐Ÿ“ฐ parse_arxiv_refs() Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.

๐Ÿ”— create_paper_links_md() Builds a minimal markdown page with numbered links to each paperโ€™s ArXiv URL.

๐Ÿ“‘ create_paper_audio_files() Processes each parsed paper, generating TTS audio and embedding base64 download links.

๐Ÿ“š display_papers() Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.

๐Ÿ—‚ display_papers_in_sidebar() Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.

๐Ÿ“‚ display_file_history_in_sidebar() Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.

๐Ÿ“ฆ create_zip_of_files() Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.

๐Ÿ” perform_ai_lookup() The main function to:

Query Anthropic (Claude) Call an ArXiv RAG pipeline Generate Q&A audio Parse and render the resulting papers ๐ŸŽง process_voice_input() Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.

๐ŸŽฌ main() Orchestrates the entire application flow:

Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor Shows file history in the sidebar Manages marquee settings and final UI layout