INDEX
Explanations
phrases related to social justice issues, particularly concerning sexual violence and financial corruption
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1464
+0.08
0.2%
562
+0.08
0.2%
314
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1765
+0.08
0.04
286
+0.08
0.03
1197
+0.07
0.04
Negative Logits
inol
-0.59
usata
-0.53
guarante
-0.53
riuscito
-0.51
riusc
-0.50
alre
-0.48
fatis
-0.47
addirittura
-0.47
aperta
-0.46
Twit
-0.46
POSITIVE LOGITS
ngOn
0.52
fornece
0.50
gneiss
0.50
TMPro
0.49
atience
0.48
oferece
0.47
démarche
0.47
सन्दर्भ
0.47
المصادر
0.46
SharedDtor
0.45
Activations Density 0.188%