We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.
We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license Rapidata/xAI_Aurora_t2i_human_preferences
We uploaded huge human annotated preference dataset for image generation. Instead of just having people choose which model they preferer, we annotated an alignment score on a word by word basis for the prompt. rate the images on coherence, overall alignment and style preference. Those images that score badly were also given to annotators to highlight problem areas. Check it out! Rapidata/text-2-image-Rich-Human-Feedback