INDEX
Explanations
text related to gender equality and feminism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.23
0.8%
1108
+0.22
0.8%
612
+0.18
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
612
+0.23
-0.00
1819
+0.22
0.06
394
+0.18
0.03
Negative Logits
redé
-0.60
élar
-0.58
expéri
-0.55
obé
-0.54
éprou
-0.53
bruh
-0.53
décro
-0.51
rassemb
-0.51
PLWABN
-0.50
rafraî
-0.49
POSITIVE LOGITS
ecru
0.79
impracticable
0.77
friable
0.69
impelled
0.64
swarovski
0.62
chert
0.62
unlaw
0.62
negroes
0.60
unwarran
0.60
withal
0.59
Activations Density 0.659%