INDEX
Explanations
instances of hedging or tentative language, often using phrases like "of course."
New Auto-Interp
Negative Logits
himſelf
-0.91
Efq
-0.91
myſelf
-0.88
pleaſure
-0.85
ſelf
-0.83
itſelf
-0.81
themſelves
-0.80
Chriftian
-0.77
Majefty
-0.74
ſelves
-0.71
POSITIVE LOGITS
Somehow
0.84
Luckily
0.82
it
0.79
Maybe
0.78
Somehow
0.76
Surely
0.74
Luckily
0.71
if
0.71
оригіналу
0.71
Maybe
0.70
Activations Density 0.329%