INDEX
Explanations
references to movement or motion-related concepts
New Auto-Interp
Negative Logits
ish
-0.19
ied
-0.18
ily
-0.15
onet
-0.15
ash
-0.15
ilian
-0.14
assen
-0.14
pline
-0.14
anoi
-0.14
iri
-0.14
POSITIVE LOGITS
-picture
0.28
picture
0.25
Picture
0.25
sickness
0.22
picture
0.22
ality
0.21
Picture
0.21
less
0.21
_picture
0.20
pictures
0.19
Activations Density 0.012%