INDEX
Explanations
words related to negative judgment or criticism
terms related to condescension and admonishment
New Auto-Interp
Negative Logits
士
-0.89
INESS
-0.77
FUL
-0.71
hare
-0.70
edom
-0.67
Grail
-0.66
اÙĦ
-0.65
crop
-0.65
Jub
-0.65
Fifth
-0.65
POSITIVE LOGITS
ension
1.10
uated
1.08
uation
1.07
ple
1.05
uating
0.98
ensions
0.96
uations
0.94
uates
0.91
ulum
0.90
insin
0.90
Activations Density 0.022%