INDEX
Explanations
opinions or personal experiences described with emphasis
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1919
+0.09
0.3%
674
+0.09
0.3%
1967
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.09
0.04
1375
+0.09
0.04
801
+0.08
0.03
Negative Logits
EEU
-0.81
PLW
-0.81
Punj
-0.69
Gaut
-0.68
unlaw
-0.66
Keny
-0.63
Sted
-0.63
BSEB
-0.62
philanth
-0.61
Intere
-0.61
POSITIVE LOGITS
ever
0.78
EVER
0.72
EVER
0.61
jemals
0.58
Mère
0.58
perles
0.58
Romains
0.57
Autoritní
0.57
iconque
0.57
gardien
0.57
Activations Density 0.111%