INDEX
Explanations
words related to a specific historical event or narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1108
+0.10
0.3%
1533
+0.09
0.3%
50
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.10
0.07
705
+0.09
0.03
1919
+0.08
0.05
Negative Logits
praktik
-0.51
ditto
-0.51
depic
-0.51
excu
-0.49
konserv
-0.48
trö
-0.48
kosme
-0.48
disgra
-0.47
shouldnt
-0.47
pessi
-0.46
POSITIVE LOGITS
⌀
0.50
Underwear
0.47
ഔ
0.47
StructEnd
0.46
numerus
0.45
DELLE
0.45
fjspx
0.44
Witaj
0.44
totiž
0.44
broderie
0.43
Activations Density 0.690%