INDEX
Explanations
words related to government policies or regulations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.08
0.2%
674
+0.08
0.2%
786
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1066
+0.08
0.04
270
+0.08
0.05
1261
+0.08
0.01
Negative Logits
Souha
-0.80
Mlle
-0.76
Portugu
-0.76
Cfr
-0.75
Abbé
-0.75
Ibidem
-0.73
intrigu
-0.73
Bartholo
-0.73
apprehen
-0.73
Hieronymus
-0.72
POSITIVE LOGITS
actual
1.02
actual
0.93
directly
0.86
actually
0.82
direct
0.81
outright
0.81
necessarily
0.78
ACTUAL
0.74
Actual
0.71
direct
0.70
Activations Density 0.565%