INDEX
Explanations
quotations and direct speech
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.18
0.6%
381
+0.17
0.5%
924
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
924
+0.18
0.07
545
+0.17
0.06
2019
+0.10
0.07
Negative Logits
increa
-2.35
reluct
-2.25
affor
-2.20
impra
-2.18
guarante
-2.16
encomp
-2.16
disagre
-2.11
suscep
-2.11
fuf
-2.07
scrat
-2.05
POSITIVE LOGITS
I
0.89
If
0.82
“
0.80
My
0.80
You
0.79
There
0.78
…
0.78
Let
0.78
()))
0.77
No
0.76
Activations Density 0.297%