1306 216 394

Pedro Cuenca

pcuenq

AI & ML interests

None yet

Recent Activity

reacted to m-ric's post with 🚀 about 4 hours ago

We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones 🔥 Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject. To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them. 🎯 For the preparation part, a key part is find all the important references on the given subject. Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an “AttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful! 📝 For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones. As a result, their system outperforms previous approaches by far! As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 🏆 I advise you to read the paper, it's a great overview of the kind of assistants that we'll get in the short future! 👉 https://huggingface.co/papers/2502.14776 Their website shows examples of generated surveys 👉 http://www.surveyx.cn/

new activity about 5 hours ago

agents-course/course-images:Rename en/unit2/smolagents/afred-party.jpg to en/unit2/smolagents/alfred-party.jpg

updated a dataset about 5 hours ago

agents-course/course-images

View all activity

Organizations

Posts 1

Post

5814

OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!

View all Posts