INDEX
Explanations
dialogue quotes and contractions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.21
1.3%
2019
+0.18
1.1%
1741
+0.12
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
478
+0.21
0.17
1224
+0.18
0.08
2019
+0.12
0.13
Negative Logits
<bos>
-3.34
intersper
-1.54
/***
-1.50
hentai
-1.47
embra
-1.41
pessi
-1.35
suspic
-1.31
milf
-1.30
-1.29
encre
-1.29
POSITIVE LOGITS
'
0.81
’
0.80
s
0.63
i
0.60
A
0.60
mathrm
0.59
S
0.59
An
0.58
I
0.58
0.57
Activations Density 1.007%