INDEX
Explanations
phrases that express strong emotions or emphasize authenticity
New Auto-Interp
Negative Logits
rous
-0.16
elight
-0.15
osas
-0.14
оÑĢож
-0.14
Absolutely
-0.14
pany
-0.14
mand
-0.14
rav
-0.14
oret
-0.14
ile
-0.14
POSITIVE LOGITS
ignment
0.24
yy
0.23
,re
0.23
yyy
0.23
yyyy
0.19
-existing
0.18
quite
0.18
nothing
0.17
estate
0.17
only
0.17
Activations Density 0.047%