INDEX
Explanations
words related to evil and wrongdoing
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.9%
1974
+0.11
0.6%
950
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.17
0.02
1974
+0.11
0.02
699
+0.09
0.02
Negative Logits
<bos>
-3.37
/***
-0.79
public
-0.76
/*!
-0.74
HasColumnType
-0.74
AutoScaleMode
-0.68
<tfoot>
-0.67
Географи
-0.67
immer
-0.65
foreach
-0.65
POSITIVE LOGITS
Minang
1.82
Juf
1.70
sappi
1.64
tramont
1.62
stockholm
1.62
bandung
1.57
wien
1.56
milano
1.53
jaya
1.52
frankfurt
1.51
Activations Density 0.127%