INDEX
Explanations
instances of aggressive or unusual behavior in a narrative context
New Auto-Interp
Negative Logits
amburger
-0.19
locker
-0.15
ffa
-0.15
ais
-0.14
underground
-0.14
ç¹ģ
-0.14
rows
-0.13
loat
-0.13
Funeral
-0.13
iferay
-0.13
POSITIVE LOGITS
door
0.28
Door
0.25
_door
0.24
Door
0.23
éŨ
0.20
inside
0.20
-door
0.20
door
0.20
knock
0.19
inside
0.19
Activations Density 0.098%