INDEX
Explanations
references to governance and public institutions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
98
+0.14
0.8%
369
+0.13
0.8%
271
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
145
+0.14
0.13
418
+0.13
-0.00
155
+0.12
0.10
Negative Logits
ľĵ
-2.75
ľ
-2.43
ĻĤ
-2.38
ĥ
-2.37
Īĺ
-2.36
²
-2.29
Ń
-2.28
ı
-2.27
ĩ
-2.23
ĵ
-2.23
POSITIVE LOGITS
iets
1.65
forum
1.60
eenth
1.59
ifice
1.56
emis
1.55
Theatre
1.55
bolt
1.54
bone
1.53
ocene
1.52
piece
1.52
Activations Density 1.259%