INDEX
Explanations
phrases indicating delayed realization or action
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
972
+0.09
0.3%
554
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1395
+0.10
0.03
314
+0.09
0.02
2010
+0.08
0.03
Negative Logits
fuj
-0.72
»>
-0.71
budapest
-0.69
afp
-0.69
vns
-0.68
fta
-0.68
ftu
-0.67
«<
-0.67
venice
-0.66
Thos
-0.65
POSITIVE LOGITS
AFTER
0.60
after
0.57
ressee
0.51
after
0.50
sizeCache
0.50
AFTER
0.48
signora
0.48
trás
0.48
setelah
0.47
ództ
0.46
Activations Density 0.279%