INDEX
Explanations
instances of the word "play" and its variations
New Auto-Interp
Negative Logits
Theſe
-1.46
Monfieur
-1.36
itſelf
-1.34
―――――
-1.24
myſelf
-1.23
ſeveral
-1.14
Reſ
-1.13
pleaſure
-1.13
Houſe
-1.12
becauſe
-1.12
POSITIVE LOGITS
(
0.82
,
0.75
in
0.75
?
0.74
the
0.71
0.70
.
0.70
so
0.68
I
0.68
come
0.68
Activations Density 0.181%