INDEX
Explanations
words related to criticism and disapproval
New Auto-Interp
Negative Logits
lihood
-0.91
EMENT
-0.68
ropolitan
-0.67
Polo
-0.66
é¾įå
-0.66
olean
-0.65
DragonMagazine
-0.64
Masters
-0.64
Rath
-0.63
å§«
-0.63
POSITIVE LOGITS
umbs
1.24
umbled
1.23
umb
1.22
umbles
1.21
anked
1.09
ump
1.08
ickets
1.05
amping
1.03
usted
1.01
umbling
1.01
Activations Density 0.009%