INDEX
Explanations
sequences of underscores or whitespace characters
New Auto-Interp
Negative Logits
.
-0.61
-0.60
vol
-0.59
"
-0.58
vol
-0.58
The
-0.56
mid
-0.56
&
-0.54
'
-0.54
0
-0.54
POSITIVE LOGITS
pleaſure
1.17
myſelf
1.08
preſent
0.98
ſtate
0.97
itſelf
0.97
ⓧ
0.93
ſhe
0.92
ſche
0.92
+#+
0.91
greateſt
0.91
Activations Density 0.402%