INDEX
Explanations
expressions of desire or intent, particularly the word "want."
New Auto-Interp
Negative Logits
myſelf
-0.93
itſelf
-0.90
pleaſure
-0.88
Jefus
-0.88
ſhould
-0.84
himſelf
-0.83
ſaid
-0.82
reaſon
-0.81
greateſt
-0.81
ſay
-0.80
POSITIVE LOGITS
no
0.73
None
0.71
nobody
0.69
geen
0.69
No
0.67
None
0.66
Not
0.66
no
0.66
none
0.66
nenhuma
0.65
Activations Density 0.422%