INDEX
Explanations
phrases indicating emphasis or importance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.15
0.5%
689
+0.12
0.4%
468
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
689
+0.15
0.05
468
+0.12
0.04
47
+0.10
0.04
Negative Logits
parch
-1.21
affor
-1.16
maneu
-1.15
milf
-1.15
intermitt
-1.14
inext
-1.14
?...
-1.11
increa
-1.11
excru
-1.10
ardu
-1.09
POSITIVE LOGITS
Eso
0.61
why
0.61
VIDEOTAPE
0.60
Ngoài
0.55
happening
0.52
Poppins
0.51
Charakter
0.50
첫
0.50
why
0.49
Gambas
0.49
Activations Density 0.277%