INDEX
Explanations
comparative statements indicating superiority
statements asserting superiority or preference
New Auto-Interp
Negative Logits
Py
-0.76
idon
-0.69
mad
-0.67
NH
-0.67
erity
-0.65
TRY
-0.64
FK
-0.63
psy
-0.61
aly
-0.61
Ber
-0.60
POSITIVE LOGITS
suited
1.02
than
0.95
behaved
0.94
acquainted
0.79
manag
0.74
cannabin
0.71
quality
0.70
lapt
0.68
than
0.68
Than
0.68
Activations Density 0.034%