INDEX
Explanations
phrases related to political resistance and activism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
605
+0.09
0.3%
1001
+0.09
0.3%
438
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
392
+0.09
0.03
2016
+0.09
0.05
2025
+0.09
0.05
Negative Logits
!...
-1.42
fta
-1.35
desir
-1.31
?...
-1.29
effe
-1.29
impractica
-1.27
purcha
-1.27
guarante
-1.27
fto
-1.26
leaft
-1.26
POSITIVE LOGITS
whatever
0.87
wherever
0.81
whatever
0.68
whichever
0.65
hichever
0.65
writeFieldEnd
0.64
BoxFit
0.63
whenever
0.62
whoever
0.61
Whatever
0.60
Activations Density 0.448%