INDEX
Explanations
references to potential threats or dangerous situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.11
0.3%
1984
+0.10
0.3%
1967
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
468
+0.11
0.05
1948
+0.10
0.06
1984
+0.09
0.06
Negative Logits
effe
-1.26
increa
-1.25
emphat
-1.24
reluct
-1.23
maneu
-1.21
?...
-1.20
unden
-1.20
snoopy
-1.18
impra
-1.18
strick
-1.17
POSITIVE LOGITS
enderror
0.68
someday
0.68
anytime
0.64
tomorrow
0.63
κτηρισ
0.62
oward
0.59
either
0.59
ribune
0.58
anywhere
0.58
calipsis
0.57
Activations Density 0.703%