INDEX
Explanations
references to political events and figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.39
1.5%
1343
+0.21
0.8%
1577
+0.21
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1842
+0.39
0.11
1948
+0.21
0.07
184
+0.21
0.02
Negative Logits
<bos>
-1.50
internetowa
-0.58
ویکیپدی
-0.57
ddelweddau
-0.55
Viitteet
-0.54
snippetHide
-0.54
ⓧ
-0.52
Diwedd
-0.52
Aholisi
-0.49
AssemblyCulture
-0.49
POSITIVE LOGITS
unspeak
1.28
pamph
1.20
EEU
1.15
unlaw
1.08
philanth
1.07
depic
1.07
Bartholo
1.06
Abbé
1.06
Incenti
1.05
intersper
1.05
Activations Density 1.438%