INDEX
Explanations
phrases expressing superiority or comparison in a positive light
expressions emphasizing superiority or preference
New Auto-Interp
Negative Logits
ß
-0.70
ô
-0.69
yip
-0.69
lish
-0.68
Þ
-0.66
����
-0.63
colour
-0.63
DP
-0.62
ÃĥÃĤ
-0.62
dh
-0.61
POSITIVE LOGITS
Trace
0.66
escaping
0.64
anybody
0.63
Way
0.63
temptation
0.62
whatsoever
0.61
THING
0.61
osal
0.60
scrolling
0.60
anyone
0.59
Activations Density 0.201%