INDEX
Explanations
the name "Sirhan" with varying levels of activations
mentions of a specific name, "Han"
New Auto-Interp
Negative Logits
smarter
-0.71
ISION
-0.64
Clicker
-0.63
Replay
-0.63
Sector
-0.63
psey
-0.60
ãĥ¯
-0.60
Eucl
-0.58
nesday
-0.57
JPEG
-0.57
POSITIVE LOGITS
igans
1.18
igan
0.92
mare
0.91
azard
0.90
abal
0.87
ttp
0.86
ild
0.86
adan
0.85
ovember
0.84
quin
0.83
Activations Density 0.029%