INDEX
Explanations
words related to acceptance and agreeing to conditions or ideas
New Auto-Interp
Negative Logits
pper
-0.18
alnız
-0.16
esp
-0.15
exact
-0.15
opathy
-0.14
ults
-0.14
cape
-0.14
CAPE
-0.14
éĩı
-0.14
ak
-0.14
POSITIVE LOGITS
ably
0.32
ance
0.30
ances
0.25
ANCE
0.23
ively
0.22
reject
0.19
responsibility
0.18
eer
0.18
ive
0.17
ÑĥÑĩаÑģÑĤие
0.17
Activations Density 0.037%