INDEX
Explanations
words related to political ideologies and figures
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.19
0.6%
1842
+0.11
0.3%
227
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.19
0.05
1343
+0.11
0.04
1499
+0.10
0.04
Negative Logits
Embal
-0.63
setTimestamp
-0.62
^{*}=-0.61
<bos>
-0.58
vPvB
-0.58
nadzieję
-0.57
ModelBuilder
-0.57
\{\\-0.56
gwaran
-0.56
kedés
-0.56
POSITIVE LOGITS
fup
1.53
sii
1.53
wien
1.48
mef
1.47
Fasc
1.45
curi
1.44
fta
1.43
nece
1.43
„,
1.42
stockholm
1.42
Activations Density 0.188%