INDEX
Explanations
concepts related to political and societal discussions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
577
+0.12
0.4%
1984
+0.10
0.3%
507
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
577
+0.12
0.02
1045
+0.10
0.01
1425
+0.10
0.02
Negative Logits
Simult
-1.11
fep
-1.09
Augu
-1.08
mef
-1.07
hcm
-1.04
secon
-1.04
aen
-1.03
afp
-1.03
fup
-1.02
nece
-1.02
POSITIVE LOGITS
.
0.74
..
0.67
…
0.62
...
0.62
astrous
0.57
But
0.54
,
0.51
<>());
0.51
ardı
0.51
but
0.50
Activations Density 0.088%