INDEX
Explanations
architectural or construction-related terms and descriptions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.13
0.4%
1385
+0.10
0.3%
1539
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
724
+0.13
0.05
2044
+0.10
0.06
736
+0.08
0.05
Negative Logits
vogli
-0.87
proprement
-0.86
doman
-0.86
saar
-0.85
allarg
-0.85
parteci
-0.84
succede
-0.84
endom
-0.83
uhr
-0.82
embra
-0.82
POSITIVE LOGITS
position
0.80
positioned
0.77
position
0.76
near
0.76
closer
0.67
close
0.66
位置
0.65
within
0.64
placed
0.63
위치
0.63
Activations Density 0.420%