INDEX
Explanations
negative sentiments and criticisms about social justice movements or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
678
+0.08
0.2%
1618
+0.08
0.2%
630
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
509
+0.08
0.05
1526
+0.08
0.04
630
+0.08
0.03
Negative Logits
vaila
-0.53
chinoise
-0.46
ILog
-0.46
AppDelegate
-0.44
:...
-0.43
hmmmm
-0.43
marg
-0.42
madden
-0.42
pgs
-0.41
asthan
-0.41
POSITIVE LOGITS
destroyAll
0.66
UnusedPrivate
0.63
Shakspeare
0.63
reputations
0.61
chèvre
0.57
brioche
0.56
churrasco
0.56
havoc
0.56
*++
0.56
hermosa
0.56
Activations Density 0.279%