INDEX
Explanations
phrases related to conflict resolution and power dynamics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.36
1.3%
1535
+0.26
0.9%
381
+0.20
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.36
0.08
1535
+0.26
0.08
1445
+0.20
0.07
Negative Logits
hentai
-1.10
coö
-1.04
impra
-1.00
tupperware
-0.99
FTFY
-0.97
casio
-0.95
Yess
-0.94
cytoplas
-0.94
emphat
-0.93
waifu
-0.91
POSITIVE LOGITS
But
0.86
Finally
0.80
Then
0.80
So
0.80
Therefore
0.79
This
0.78
I
0.77
And
0.77
However
0.75
Maybe
0.75
Activations Density 0.255%