INDEX
Explanations
negative statements or criticisms
negations and phrases expressing doubt or lack of certainty
New Auto-Interp
Negative Logits
olon
-0.73
lique
-0.73
cerpt
-0.68
ali
-0.67
ription
-0.66
lesi
-0.65
ourse
-0.65
lav
-0.65
Lux
-0.65
andan
-0.64
POSITIVE LOGITS
darn
0.80
iga
0.74
IFT
0.68
Cola
0.66
hin
0.64
CO
0.63
Sad
0.62
divorced
0.62
WHERE
0.62
ifiable
0.62
Activations Density 0.092%