INDEX
Explanations
terms relating to societal and political issues, particularly focusing on feminist and leftist discourse
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2033
+0.11
0.3%
919
+0.10
0.3%
1253
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.11
0.02
972
+0.10
0.04
2033
+0.09
0.07
Negative Logits
وتسجيلات
-0.44
createStatement
-0.44
ECONDS
-0.44
lasse
-0.43
fré
-0.43
ophosph
-0.42
orthon
-0.41
rech
-0.41
Fieber
-0.41
vektor
-0.40
POSITIVE LOGITS
women
0.97
sexist
0.95
sexism
0.93
female
0.89
females
0.88
gender
0.87
Women
0.86
feminist
0.85
feminists
0.83
Women
0.82
Activations Density 1.075%