INDEX
Explanations
references to physical spaces or locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.5%
325
+0.05
0.2%
2004
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1556
+0.12
0.04
1738
+0.05
0.04
325
+0.05
0.04
Negative Logits
<bos>
-1.71
ⓧ
-0.74
<?
-0.74
<?
-0.72
-0.72
/*
-0.70
/**
-0.70
/***
-0.69
prepare
-0.68
continue
-0.68
POSITIVE LOGITS
space
1.73
affor
1.67
maneu
1.67
wien
1.65
Space
1.64
accla
1.61
fta
1.60
SPACE
1.60
increa
1.59
aen
1.58
Activations Density 0.125%