INDEX
Explanations
mentions of a specific person (likely a character or figure) named "Manziel" in a story
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
67
+0.15
0.6%
1516
+0.15
0.6%
1256
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1516
+0.15
0.05
67
+0.15
0.04
1950
+0.13
0.04
Negative Logits
celona
-0.58
harmed
-0.54
<bos>
-0.50
solicited
-0.49
село
-0.48
resolve
-0.48
Atsauces
-0.47
HostException
-0.45
INVISIBLE
-0.45
torie
-0.44
POSITIVE LOGITS
peppa
1.23
Man
1.21
MAN
1.18
Man
1.16
swarovski
1.13
mann
1.10
squa
1.09
simplif
1.08
pixar
1.08
uncin
1.08
Activations Density 0.120%