DeepResearchEvaluator

Running on CPU Upgrade

App Files Files Community

DeepResearchEvaluator / README.md

awacke1

Update README.md

b6fb714 verified 11 days ago

preview code

raw

history blame contribute delete

5.41 kB

	---
	title: 🧠Deep🐍Research🌐Evaluator
	emoji: 🧠🐍🌐
	colorFrom: red
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.41.1
	app_file: app.py
	pinned: true
	license: mit
	short_description: Deep Research Evaluator for Long Horizon Learning Tasks
	---

	# 🎵', '🎶', '🎸', '🎹', '🎺', '🎷', '🥁', '🎻

	Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences.

	Claude AI is built in.

	Mixtral MOE 8B is the central model used alongside the arxiv embeddings RAG search:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/620630b603825909dcbeba35/742HW6RWYk35BK2g5Eq-T.png)



	# Project Architecture

	- 📂 Root Folder
	- app.py (🤖 Streamlit App)
	- Main entry point for your Streamlit application.
	- requirements.txt (📋 Dependencies)
	- Lists all the Python packages needed to run the app.
	- 📂 mycomponent (🔧 HTML Component)
	- A subdirectory containing your custom Streamlit component code.
	- \_\_init\_\_.py (🐍 Python Init)
	- Tells Python this folder is a module/package.
	- index.html (🌐 Custom HTML)
	- Front-end HTML/JS/CSS for the custom component.

	```mermaid
	flowchart TB
	A[📂 Root Folder] --> B[app.py 🤖<br>(Streamlit App)]
	A --> C[requirements.txt 📋<br>(Dependencies)]
	A --> D[📂 mycomponent 🔧<br>(HTML Component)]
	D --> E[__init__.py 🐍<br>(Python Init)]
	D --> F[index.html 🌐<br>(Custom HTML)]
	```

	---

	Usage Flow:

	1. You run `streamlit run app.py`.
	2. app.py imports mycomponent to load the HTML from index.html.
	3. requirements.txt ensures needed dependencies are installed.
	4. The \_\_init\_\_.py file ensures the custom component folder is recognized as a Python package.

	Notes:
	- app.py hosts your Streamlit logic and references the mycomponent.
	- index.html supplies the interface for any front-end custom elements.
	- requirements.txt keeps the environment consistent.




	Features
	🎯 Core Configuration & Setup
	Configures the Streamlit page with title “🚲TalkingAIResearcher🏆”, sets layout, sidebar states, and environment variables.

	🔑 API Setup & Clients
	Loads and initializes OpenAI, Anthropic, and HuggingFace clients from environment variables and secrets.

	📝 Session State Management
	Manages conversation history, transcripts, file editing states, and model selections.

	🧠 get_high_info_terms()
	Extracts top words/bigrams from a text by counting frequency and filtering out stop words.

	🏷️ clean_text_for_filename()
	Sanitizes text for valid filenames by removing special characters, short/unhelpful words, and truncating length.

	📄 generate_filename()
	Creates an intelligent filename based on timestamps, high-info terms, and a snippet of the content (removing duplicates).

	💾 create_file()
	Saves prompt + response content to a file, using generate_filename().

	🔗 get_download_link()
	Generates base64-encoded download links for .md, audio, or zip files for inline downloading.

	🎤 clean_for_speech()
	Strips out line breaks, URLs, and symbols to create more readable text for TTS.

	🎙️ edge_tts_generate_audio()
	Asynchronously generates audio files (e.g., .mp3) using Edge TTS.

	🔊 speak_with_edge_tts()
	A wrapper function for the async TTS call, allowing direct usage in synchronous code.

	🎵 play_and_download_audio()
	Embeds an audio player in Streamlit and provides a download link for that audio file.

	💿 save_qa_with_audio()
	Stores Q&A content in a markdown file and generates TTS audio for the question + answer.

	📰 parse_arxiv_refs()
	Parses the multi-line markdown references returned by the ArXiv RAG pipeline into structured paper objects.

	🔗 create_paper_links_md()
	Builds a minimal markdown page with numbered links to each paper’s ArXiv URL.

	📑 create_paper_audio_files()
	Processes each parsed paper, generating TTS audio and embedding base64 download links.

	📚 display_papers()
	Shows papers in the main area with a scrolling marquee (via streamlit_marquee), plus expanders for details and audio.

	🗂 display_papers_in_sidebar()
	Mirrors the paper listing in the sidebar with expanders, letting users quickly play or download paper audio.

	📂 display_file_history_in_sidebar()
	Enumerates all local .md, .mp3, .wav files in descending modification time, letting users preview and download them.

	📦 create_zip_of_files()
	Bundles multiple files (markdown + audio) into a zip with an automatically shortened filename.

	🔍 perform_ai_lookup()
	The main function to:

	Query Anthropic (Claude)
	Call an ArXiv RAG pipeline
	Generate Q&A audio
	Parse and render the resulting papers
	🎧 process_voice_input()
	Receives user text/voice input, then calls perform_ai_lookup() to produce an audio summary and final Q&A file.

	🎬 main()
	Orchestrates the entire application flow:

	Renders tabs for Voice Input, Media Gallery, ArXiv search, and Editor
	Shows file history in the sidebar
	Manages marquee settings and final UI layout