INDEX
Explanations
adjectives with positive or negative connotations, with a preference for negative ones
words and phrases related to truth, reasoning, and contradictions
New Auto-Interp
Negative Logits
gdala
-0.84
Flavoring
-0.71
throats
-0.71
virginity
-0.63
contacts
-0.60
purse
-0.58
deen
-0.58
packs
-0.58
slot
-0.58
nickname
-0.58
POSITIVE LOGITS
actly
1.12
ivable
1.12
urable
1.05
icable
1.03
ceivable
1.02
inently
1.02
vable
1.01
gru
1.01
itatively
1.00
izable
1.00
Activations Density 0.240%