INDEX
Explanations
words related to female empowerment, struggles, and activism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
625
+0.10
0.3%
110
+0.10
0.3%
1525
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
756
+0.10
0.06
625
+0.10
0.05
944
+0.09
0.03
Negative Logits
makro
-0.72
diffusi
-0.71
tyn
-0.69
kontinu
-0.66
simplif
-0.65
ideolog
-0.65
dovr
-0.65
valuta
-0.64
allarg
-0.63
teras
-0.62
POSITIVE LOGITS
women
0.81
feminist
0.78
herself
0.77
womanhood
0.74
Women
0.74
empowering
0.73
empower
0.73
empowerment
0.72
feminism
0.69
Woman
0.68
Activations Density 0.685%