INDEX
Explanations
expressions indicating frustration and dissatisfaction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1235
+0.08
0.2%
18
+0.08
0.2%
1852
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
18
+0.08
0.03
426
+0.08
0.03
361
+0.08
0.03
Negative Logits
makro
-0.86
kask
-0.85
karton
-0.81
kade
-0.77
silikon
-0.77
kön
-0.74
etui
-0.71
elek
-0.71
moza
-0.71
alkoh
-0.71
POSITIVE LOGITS
being
0.47
ABUL
0.45
hearing
0.43
dealing
0.43
pretending
0.42
blowing
0.42
playing
0.42
padx
0.42
antimony
0.41
chasing
0.41
Activations Density 0.157%