INDEX
Explanations
No Explanations Found
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.5%
1145
+0.07
0.3%
814
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
990
+0.12
0.05
1343
+0.07
0.04
1145
+0.07
0.03
Negative Logits
<bos>
-1.63
springfox
-0.84
ⓧ
-0.73
expand
-0.70
establish
-0.70
<?
-0.69
colspan
-0.68
-0.68
engage
-0.67
/**
-0.66
POSITIVE LOGITS
accla
1.68
véhic
1.64
affor
1.63
délib
1.60
effe
1.60
de
1.59
mef
1.56
wien
1.56
maneu
1.56
stockholm
1.55
Activations Density 0.124%