INDEX
Explanations
the use of openers or introductory phrases in a text
New Auto-Interp
Negative Logits
(
-0.83
y
-0.79
"
-0.76
p
-0.72
*
-0.71
n
-0.67
.
-0.65
b
-0.64
blo
-0.64
tg
-0.63
POSITIVE LOGITS
myſelf
1.83
itſelf
1.80
themſelves
1.64
pleaſure
1.61
Jefus
1.59
becauſe
1.57
himſelf
1.57
Reſ
1.51
ſelf
1.49
Majefty
1.48
Activations Density 0.017%