INDEX
Explanations
phrases related to legal or political controversies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.17
0.5%
2034
+0.10
0.3%
1870
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1013
+0.17
0.10
284
+0.10
0.08
690
+0.09
0.03
Negative Logits
affor
-2.21
increa
-2.19
purcha
-2.18
encomp
-2.16
scrat
-2.03
fuf
-2.03
emphat
-2.02
volunte
-2.01
peppa
-2.00
guarante
-1.98
POSITIVE LOGITS
.
0.95
;
0.86
again
0.82
.;
0.79
。
0.77
,
0.76
with
0.75
due
0.75
and
0.74
but
0.73
Activations Density 0.820%