INDEX
Explanations
sequences of whitespace characters
New Auto-Interp
Negative Logits
-0.71
↵
-0.60
to
-0.58
<eos>
-0.53
'
-0.53
are
-0.52
'
-0.52
prom
-0.52
h
-0.51
lo
-0.49
POSITIVE LOGITS
pleaſure
1.28
purpoſe
1.18
uſe
1.13
myſelf
1.12
ſtate
1.10
itſelf
1.07
houſe
1.05
ſever
1.05
ſmall
1.02
greateſt
1.01
Activations Density 0.155%