INDEX
Explanations
actions involving jumping or falling, typically from heights into water or other spaces
New Auto-Interp
Negative Logits
phia
-0.16
ked
-0.16
ναν
-0.14
nei
-0.14
bst
-0.14
emory
-0.14
borg
-0.14
.weapon
-0.14
uild
-0.14
enti
-0.14
POSITIVE LOGITS
onto
0.15
Heights
0.15
heights
0.15
ptom
0.14
Shot
0.14
porto
0.14
landing
0.14
onto
0.14
rooft
0.14
height
0.14
Activations Density 0.100%