INDEX
Explanations
the word "against" and mentions of opposition or conflict
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
897
+0.14
0.5%
605
+0.13
0.5%
1974
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1805
+0.14
0.04
1974
+0.13
0.04
2016
+0.12
0.05
Negative Logits
alert
-0.43
للاسماء
-0.42
ima
-0.41
now
-0.41
php
-0.41
newFile
-0.41
alerts
-0.41
Noti
-0.40
Shower
-0.40
Addi
-0.40
POSITIVE LOGITS
Against
1.08
Against
1.07
AGAINST
1.03
against
1.03
against
0.99
contre
0.73
gegen
0.69
concha
0.68
*++
0.68
haviour
0.66
Activations Density 0.089%