INDEX
Explanations
references to specific video game titles and characters
New Auto-Interp
Negative Logits
iller
-0.78
fare
-0.74
arial
-0.72
corridors
-0.68
iates
-0.66
resil
-0.64
Examiner
-0.64
inct
-0.63
aylor
-0.63
seriousness
-0.60
POSITIVE LOGITS
Frames
0.77
ota
0.77
Frame
0.73
icut
0.73
ernaut
0.72
©¶æ¥µ
0.70
anson
0.69
stad
0.69
Pacers
0.68
NES
0.67
Activations Density 1.613%