INDEX
Explanations
words related to controversial topics, specifically around trans rights and activism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.23
0.8%
845
+0.16
0.5%
1314
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.23
0.09
1314
+0.16
0.08
845
+0.11
0.05
Negative Logits
AndEndTag
-0.59
ביוגרפיה
-0.57
GraphicsUnit
-0.57
TestingModule
-0.56
跳转至
-0.54
thể
-0.54
Fordítás
-0.53
헌
-0.53
AssemblyVersion
-0.51
Referências
-0.50
POSITIVE LOGITS
mef
1.61
Intere
1.53
Juf
1.52
fta
1.51
alberto
1.49
volunte
1.49
effe
1.49
dises
1.48
wien
1.48
ftu
1.45
Activations Density 1.035%