INDEX
Explanations
phrases related to making suggestions or implications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
31
+0.17
0.6%
732
+0.12
0.4%
30
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
31
+0.17
0.04
699
+0.12
0.04
30
+0.11
0.04
Negative Logits
reconnaît
-0.77
défend
-0.75
PLW
-0.70
inev
-0.69
déclare
-0.68
Entra
-0.67
Nema
-0.66
Opportun
-0.65
accla
-0.65
immen
-0.65
POSITIVE LOGITS
suggest
1.00
suggests
0.97
suggestion
0.94
suggested
0.92
suggestions
0.91
suggesting
0.89
SUGGEST
0.86
ugges
0.77
Suggest
0.76
suggested
0.75
Activations Density 0.083%