INDEX
Explanations
instances where comparisons are made between different situations or entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.09
0.2%
1833
+0.07
0.2%
651
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1833
+0.09
0.04
1794
+0.07
0.03
848
+0.07
0.03
Negative Logits
ftu
-1.00
igno
-0.94
secon
-0.94
fta
-0.93
inder
-0.92
fto
-0.91
seiz
-0.90
uniqu
-0.88
fup
-0.88
unil
-0.88
POSITIVE LOGITS
still
1.39
still
1.31
Still
1.23
Still
1.16
STILL
1.03
vẫn
0.95
nevertheless
0.88
nonetheless
0.85
remain
0.83
nadal
0.82
Activations Density 0.330%