INDEX
Explanations
phrases related to hate and negativity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
555
+0.14
0.5%
120
+0.12
0.4%
896
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
376
+0.14
0.03
1708
+0.12
0.03
555
+0.12
0.02
Negative Logits
المعيارى
-0.58
Composição
-0.53
FieldNumber
-0.50
ResumeLayout
-0.49
tisgarh
-0.49
IntegerField
-0.48
CharField
-0.48
Apesar
-0.47
Horário
-0.47
ApiProperty
-0.47
POSITIVE LOGITS
maneu
1.20
depic
1.13
curé
1.13
unve
1.13
HATE
1.13
encomp
1.12
Abbé
1.12
hairc
1.10
affor
1.09
increa
1.09
Activations Density 0.104%