INDEX
Explanations
phrases related to negations and prohibitions
negative contractions related to denial or refusal
New Auto-Interp
Negative Logits
ersed
-0.67
accompan
-0.66
Darling
-0.63
Older
-0.61
lined
-0.60
Xuan
-0.59
higher
-0.59
anni
-0.58
HAHA
-0.58
ranged
-0.57
POSITIVE LOGITS
condone
1.25
tolerate
1.07
ourselves
0.99
know
0.91
yet
0.90
want
0.90
expect
0.89
need
0.88
intend
0.87
ird
0.87
Activations Density 0.100%