INDEX
Explanations
phrases related to radical ideas or movements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
161
+0.25
1.3%
871
+0.21
1.1%
82
+0.18
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
161
+0.25
0.04
871
+0.21
0.03
1351
+0.18
0.03
Negative Logits
<bos>
-1.19
deposit
-0.60
Pey
-0.60
Trabaj
-0.59
deposit
-0.59
EnumMember
-0.58
눠
-0.58
Fonto
-0.58
Abbiamo
-0.57
htbp
-0.56
POSITIVE LOGITS
Radical
1.44
Radical
1.33
radical
1.29
radical
1.26
gaily
1.25
laft
1.18
yves
1.14
ftu
1.14
frankfurt
1.13
leaft
1.13
Activations Density 0.393%