INDEX
Explanations
references to laws or legal matters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.5%
370
+0.06
0.3%
341
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
749
+0.10
0.05
473
+0.06
0.05
18
+0.06
0.04
Negative Logits
<bos>
-1.55
public
-0.77
ⓧ
-0.75
//
-0.69
.
-0.68
ంట
-0.68
,
-0.68
-0.67
-0.66
/*
-0.66
POSITIVE LOGITS
affor
1.76
accla
1.74
maneu
1.71
wien
1.71
emphat
1.70
stockholm
1.70
increa
1.69
Law
1.68
Juf
1.66
impra
1.66
Activations Density 0.066%