INDEX
Explanations
instances of the word "bother" and its variations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
148
+0.15
0.8%
314
+0.14
0.8%
115
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
314
+0.15
0.02
265
+0.14
0.02
367
+0.13
0.02
Negative Logits
deserve
-1.51
Psychological
-1.48
quin
-1.46
lie
-1.46
IRE
-1.39
enen
-1.37
rely
-1.37
depend
-1.37
Studies
-1.34
>&
-1.33
POSITIVE LOGITS
ľĵ
2.10
¡
2.01
¦
1.86
IJ
1.86
Ł
1.82
°
1.81
Ń
1.78
ŀ
1.77
Ļª
1.77
¨
1.76
Activations Density 0.023%