INDEX
Explanations
words related to promotions or bonuses
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
966
+0.07
0.2%
50
+0.06
0.2%
25
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
585
+0.07
0.02
368
+0.06
0.02
117
+0.06
0.02
Negative Logits
<bos>
-0.97
ⓧ
-0.59
public
-0.57
ૈ
-0.54
//
-0.54
become
-0.52
/*
-0.52
//
-0.52
ੱਚ
-0.52
protected
-0.51
POSITIVE LOGITS
starter
2.49
Starter
2.39
Starter
2.12
starters
2.06
starter
2.04
affor
1.25
lidl
1.23
stockholm
1.22
maneu
1.14
scrat
1.12
Activations Density 0.121%