INDEX
Explanations
technical terms related to legal matters or government actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.26
0.9%
50
+0.18
0.6%
964
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.26
0.23
394
+0.18
0.17
1870
+0.14
0.12
Negative Logits
affez
-1.81
effe
-1.73
fta
-1.72
sappi
-1.69
erec
-1.67
?...
-1.64
ftu
-1.63
fto
-1.63
desir
-1.62
oner
-1.62
POSITIVE LOGITS
0.64
Mr
0.63
alone
0.62
(
0.62
that
0.61
—
0.59
and
0.59
и
0.58
or
0.58
Mr
0.58
Activations Density 5.022%