INDEX
Explanations
mentions of legal and political terms or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
764
+0.10
0.3%
605
+0.09
0.2%
2006
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2006
+0.10
0.06
764
+0.09
0.05
1499
+0.08
0.06
Negative Logits
kram
-1.40
„,
-1.32
meis
-1.30
utop
-1.29
abnorm
-1.28
solidar
-1.27
ciga
-1.19
dises
-1.19
plak
-1.19
lele
-1.17
POSITIVE LOGITS
regarding
0.99
concerning
0.91
relating
0.73
about
0.73
regarding
0.72
về
0.71
Regarding
0.69
Regarding
0.67
Concerning
0.65
关于
0.65
Activations Density 0.682%