INDEX
Explanations
important entities or locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
331
+0.07
0.2%
665
+0.07
0.2%
805
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.07
0.02
1343
+0.07
0.03
762
+0.06
0.02
Negative Logits
਼
-0.69
września
-0.66
respond
-0.64
springfox
-0.64
enter
-0.64
due
-0.63
consider
-0.62
avoid
-0.62
gain
-0.62
cả
-0.62
POSITIVE LOGITS
ico
2.20
squa
2.09
increa
2.06
suscep
2.03
affor
1.99
erad
1.95
inev
1.88
maneu
1.88
volunte
1.87
impra
1.87
Activations Density 0.203%