INDEX
Explanations
actions related to taking or acquiring
New Auto-Interp
Negative Logits
gnore
-0.16
ilda
-0.15
ofilm
-0.15
omorphic
-0.15
ennes
-0.15
илÑĮ
-0.14
erdem
-0.14
alling
-0.14
assen
-0.13
ofil
-0.13
POSITIVE LOGITS
aim
0.28
center
0.24
us
0.24
place
0.23
centre
0.22
things
0.21
viewers
0.20
aim
0.20
flight
0.20
Aim
0.20
Activations Density 0.039%