INDEX
Explanations
touching or sensitive interactions between characters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.33
1.2%
1535
+0.19
0.7%
381
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.33
0.06
946
+0.19
0.05
927
+0.17
0.05
Negative Logits
embodi
-0.99
scrat
-0.95
JOSÉ
-0.94
maneu
-0.93
impra
-0.93
Cringe
-0.91
downvote
-0.90
pooh
-0.90
guarante
-0.90
Lmfao
-0.89
POSITIVE LOGITS
Then
0.83
<bos>
0.73
After
0.70
The
0.70
But
0.70
And
0.69
This
0.68
Everyone
0.67
Finally
0.66
When
0.65
Activations Density 0.174%