INDEX
Explanations
references to visual imagery and descriptions
New Auto-Interp
Negative Logits
#ad
-0.15
imar
-0.14
åīįçļĦ
-0.14
agli
-0.14
CJK
-0.14
Poz
-0.14
åĿ¡
-0.14
andre
-0.13
alic
-0.13
Overflow
-0.13
POSITIVE LOGITS
picture
0.20
-strokes
0.20
-picture
0.18
каÑĢÑĤи
0.18
painted
0.17
205
0.17
impression
0.17
shape
0.17
211
0.16
erson
0.16
Activations Density 0.054%