INDEX
Explanations
adjectives describing differences and contrasts between things or situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
394
+0.13
0.4%
856
+0.13
0.4%
453
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
605
+0.13
0.00
630
+0.13
0.02
2044
+0.11
0.05
Negative Logits
snoopy
-0.99
reluct
-0.98
gaily
-0.96
inev
-0.94
depic
-0.94
strick
-0.94
vainly
-0.93
apprehen
-0.92
impra
-0.92
disagre
-0.92
POSITIVE LOGITS
.
0.81
<bos>
0.68
stdarg
0.55
。
0.53
。「
0.51
—
0.49
GOTREF
0.49
AssemblyCulture
0.48
gyhoeddwyd
0.47
:
0.47
Activations Density 0.416%