INDEX
Explanations
references to visual elements or descriptors related to appearance and visual attributes
New Auto-Interp
Negative Logits
ta
-0.22
tti
-0.20
to
-0.20
ric
-0.19
rics
-0.19
men
-0.18
tp
-0.18
bot
-0.18
tn
-0.18
tt
-0.17
POSITIVE LOGITS
u
0.22
jal
0.20
i
0.19
enor
0.18
jni
0.17
enos
0.16
и
0.16
iator
0.16
osite
0.16
j
0.16
Activations Density 0.065%