INDEX
Explanations
phrases indicating fairness or honesty
phrases emphasizing fairness, honesty, and clarity in discussions
New Auto-Interp
Negative Logits
surf
-0.73
med
-0.64
edu
-0.62
Build
-0.61
ãĥĺ
-0.60
seams
-0.59
ãĥIJ
-0.58
Written
-0.58
satur
-0.57
bern
-0.57
POSITIVE LOGITS
ensional
0.75
Opinion
0.75
idge
0.72
ohn
0.72
oops
0.70
Ans
0.69
ayson
0.69
ESCO
0.69
Obj
0.68
Philippe
0.67
Activations Density 0.080%