Had a question about the classifier and the metadata you mentioned attr_to_balance set to ["disease","lvef","age","sex","length"]
Hey there,
Thank you so much for your contribution! I had a couple of questions about the classifier in Geneformer:
Is the classifier implemented as an additional layer on top of the Geneformer model to predict gene expression? Specifically, how is the final layer defined? Does it take the gene outputs from the model and apply a softmax function over cell types, or does it follow a different approach?
If I wanted metadata such as "age" and "sex" to influence the embeddings—e.g., to predict gene expression across all cells while considering these factors—would it be possible to integrate them directly into the model? Or would I need to use a separate encoder for these metadata attributes and then merge the embeddings before making predictions?
Thank you for your questions! The classification layer is a standard classification layer that takes in the embeddings from the last layer of the model and outputs logits for the prediction of the classes. It does not predict gene expression, but predicts the gene or cell classes for the classification task.
For biological attributes like age, you can either include another task for the model to learn this attribute, therefore encoding it in the embedding space, using the multi-task learning approach, or design custom tokens to add it to the input.