INDEX
Explanations
political and legal terms or concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.13
0.4%
1177
+0.12
0.4%
872
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2011
+0.13
0.07
623
+0.12
0.07
734
+0.09
0.06
Negative Logits
<bos>
-0.72
RectangleBorder
-0.72
Halen
-0.71
bú
-0.67
tafogo
-0.65
tré
-0.63
BorderSide
-0.63
sedia
-0.62
perla
-0.62
WriteTagHelper
-0.61
POSITIVE LOGITS
reluct
1.60
disagre
1.57
shenan
1.53
unwarran
1.52
affor
1.50
increa
1.48
pamph
1.47
impra
1.46
unspeak
1.44
encomp
1.43
Activations Density 1.114%