INDEX
Explanations
references to video games and their characters
New Auto-Interp
Negative Logits
нал
-0.16
CHIP
-0.15
lav
-0.14
osit
-0.14
Nab
-0.14
eneg
-0.14
ach
-0.14
abase
-0.13
ingt
-0.13
oxel
-0.13
POSITIVE LOGITS
Mort
0.35
MK
0.28
fatalities
0.27
mort
0.26
mort
0.26
MK
0.25
Fatal
0.24
mortal
0.24
-mort
0.23
mortality
0.22
Activations Density 0.005%