Shreyas

Shreyas094

AI & ML interests

None yet

Recent Activity

liked a Space 1 day ago
KoonJamesZ/ocr-table-v2
liked a Space 7 days ago
akhaliq/anychat
updated a Space 8 days ago
Shreyas094/SearXNG-AI-v2
View all activity

Organizations

Sentinel's profile picture

Shreyas094's activity

reacted to Jaward's post with ๐Ÿ‘ 3 months ago
reacted to maxiw's post with ๐Ÿ‘ 3 months ago
view post
Post
2189
You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. ๐Ÿ’ป

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")


Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B
ยท
posted an update 5 months ago
view post
Post
603
Is there any good multimodal pdf rag application, my task is to extract tables from unstructured pdfs and convert the same to xlsx file. Current python libraries are not capable of doing the same task with ease, imo vision models are capable of handling such task
replied to their post 5 months ago
replied to their post 5 months ago
view reply

Agents and function calling tools is something that I recently explored and seems promising. I am exploring the possibilities.

replied to their post 5 months ago
view reply

Hey john currently the open source models are not that good with coding, even GPT for that but Claude 3.5 sonnet is the best with limited code errors. Maybe a model trained on codes specifically would be able to handle such task. But the idea is really good, also I found a lot of good spaces in the above link, thank you so much.

replied to their post 5 months ago
view reply

Hey thank you so much John that was really insightful. I will surely read the above post.

replied to their post 5 months ago
view reply

Hi John, thanks so much for the contribution. However, I would like to implement some upgrades to my RAG setup for PDF summarization task. Currently I have not worked alot on my Vector DB creation, chunking, indexing and embeddings part. I feel working on these functions shall improve the retrieval process, especially when it comes to 100-200 pager research documents. If possible, can you provide some suggestion on that part. Thanks

posted an update 5 months ago
view post
Post
653
Help me to upgrade my model.

Hi all, so I am a complete beginner in coding, however, with the help of Claude (similar to Matt :P) and GPT 4o have been able to develop this RAG PDF summarizer/Q&A plus a web search tool.

The application is specifically built for summarization task including summarizing a financial document, news article, resume, research document, call transcript, etc.

The space could be found here: Shreyas094/SearchGPT

The news tool simply use duckduckgo chat to generate the search results using llama 3.1 70bn model.

I want your support to fine tune the retrieval task for handling more unstructured documents.
ยท