INDEX
Explanations
words related to negativity or negation
terms associated with negativity or neglect
New Auto-Interp
Negative Logits
realDonaldTrump
-0.75
BuyableInstoreAndOnline
-0.67
Falk
-0.65
WATCH
-0.65
Mour
-0.65
VEL
-0.63
Takeru
-0.62
Rath
-0.62
ORGE
-0.61
Butterfly
-0.61
POSITIVE LOGITS
otiation
1.71
oti
1.68
atives
1.40
ativity
1.24
rito
1.20
lected
1.20
lig
1.15
atively
1.14
ative
1.11
oci
1.08
Activations Density 0.020%