INDEX
Explanations
being subjected to judgment or mistreatment
New Auto-Interp
Negative Logits
encouraged
0.46
allowed
0.45
dn
0.44
granted
0.44
equipped
0.41
instilled
0.41
able
0.41
ausgestattet
0.39
unleashed
0.39
seeking
0.39
POSITIVE LOGITS
manipulated
0.57
talked
0.56
photographed
0.55
chatted
0.55
dominated
0.53
sinned
0.52
disagreed
0.52
watched
0.51
dominated
0.51
watched
0.50
Activations Density 0.016%