INDEX
Explanations
descriptive phrases with negative connotations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
137
+0.09
0.3%
776
+0.08
0.2%
1906
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
652
+0.09
0.04
137
+0.08
0.04
1088
+0.07
0.04
Negative Logits
michelin
-1.03
ivi
-0.92
glan
-0.92
utop
-0.92
casio
-0.92
alkoh
-0.88
hek
-0.88
vell
-0.88
ohr
-0.88
dci
-0.87
POSITIVE LOGITS
twist
0.52
faptul
0.52
üsü
0.50
OSError
0.49
aspect
0.49
IOError
0.48
fact
0.47
Betracht
0.45
twist
0.45
vorticity
0.44
Activations Density 0.192%