INDEX
Explanations
the word "advocacy" and related forms, as well as phrases indicating support or defense of certain causes or groups
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
506
+0.14
0.5%
1044
+0.14
0.5%
1870
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1044
+0.14
0.03
321
+0.14
0.03
506
+0.13
0.02
Negative Logits
Planta
-0.45
Pierce
-0.44
indl
-0.43
iddhar
-0.43
hapa
-0.42
Pierce
-0.42
wele
-0.41
autorytatywna
-0.40
Volver
-0.40
Jie
-0.39
POSITIVE LOGITS
advocate
1.22
advocacy
1.18
advocating
1.13
advocates
1.12
advoc
1.08
Advocacy
1.07
Advocates
1.01
advocated
1.00
Advocate
0.98
Advoc
0.92
Activations Density 0.079%