INDEX
Explanations
instances where the word "instead" is used
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1839
+0.12
0.4%
1805
+0.11
0.4%
479
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1805
+0.12
0.04
1218
+0.11
0.04
442
+0.11
0.03
Negative Logits
Ehh
-0.64
Lma
-0.59
Rgds
-0.57
Tbh
-0.56
Sooo
-0.56
Hahah
-0.56
Whoo
-0.56
Noice
-0.55
Ahhhh
-0.55
Perfor
-0.53
POSITIVE LOGITS
Instead
0.72
Instead
0.71
instead
0.65
instead
0.61
Espèce
0.58
izvē
0.58
<bos>
0.56
actéristique
0.56
spē
0.54
ļ
0.54
Activations Density 0.048%