INDEX
Explanations
mentions of specific physical objects or locations within a narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1222
+0.15
0.6%
1926
+0.12
0.4%
892
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1222
+0.15
0.02
1926
+0.12
0.02
892
+0.12
0.02
Negative Logits
impra
-0.81
accla
-0.80
strick
-0.70
myn
-0.70
unspeak
-0.66
gild
-0.66
ugg
-0.65
encomp
-0.63
vagu
-0.62
affor
-0.62
POSITIVE LOGITS
window
1.54
window
1.49
Window
1.43
windows
1.41
Window
1.33
windows
1.29
WINDOW
1.20
WINDOW
1.20
Windows
1.15
Windows
1.10
Activations Density 0.060%