INDEX
Explanations
phrases related to imparting a message of caution or warning
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1108
+0.13
0.4%
1967
+0.12
0.3%
1343
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1958
+0.13
0.04
1087
+0.12
0.03
504
+0.10
0.04
Negative Logits
increa
-1.49
milf
-1.48
affor
-1.43
alre
-1.38
fuf
-1.36
madonna
-1.36
perfet
-1.35
disagre
-1.33
scrat
-1.33
strick
-1.32
POSITIVE LOGITS
testify
0.52
da
0.52
PerformLayout
0.51
FloatField
0.49
щадь
0.48
Até
0.48
how
0.48
BoxShadow
0.48
why
0.48
też
0.47
Activations Density 0.337%