INDEX
Explanations
personal interactions involving giving or receiving advice or directives
pronouns used in directives or communication
New Auto-Interp
Negative Logits
arious
-0.61
aired
-0.61
arij
-0.60
hement
-0.60
arette
-0.59
hift
-0.59
ibal
-0.59
pei
-0.59
atl
-0.59
xtap
-0.58
POSITIVE LOGITS
beforehand
0.89
goodbye
0.83
how
0.82
bluntly
0.80
orally
0.79
plainly
0.77
about
0.77
why
0.77
otherwise
0.73
ledge
0.70
Activations Density 0.079%