INDEX
Explanations
expressions of gratitude or appreciation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
76
+0.16
0.9%
156
+0.11
0.6%
188
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
76
+0.16
0.04
188
+0.11
0.02
2
+0.11
0.03
Negative Logits
Ļ
-2.08
¶
-1.96
↵
-1.92
↵
-1.92
↵↵
-1.92
-1.92
č↵
-1.92
↵
-1.92
-1.92
<|outofrange|>
-1.92
POSITIVE LOGITS
chitz
1.65
bourg
1.64
ipation
1.61
uls
1.60
orbit
1.53
orage
1.51
hips
1.51
denly
1.50
etically
1.49
rically
1.48
Activations Density 0.596%