INDEX
Explanations
phrases related to choices and preferences in visual content
New Auto-Interp
Negative Logits
imbus
-0.19
obia
-0.17
å½
-0.16
igers
-0.15
¶Į
-0.15
ìĽĥ
-0.15
亡
-0.15
orra
-0.15
Detach
-0.15
ebi
-0.15
POSITIVE LOGITS
jec
0.17
conven
0.16
Burning
0.15
rosse
0.14
tte
0.14
cou
0.14
athe
0.14
pal
0.14
fun
0.14
igin
0.14
Activations Density 0.020%