INDEX
Explanations
words related to direction or orientation
phrases indicating direction or movement
New Auto-Interp
Negative Logits
NZ
-0.84
NZ
-0.77
organisation
-0.74
organised
-0.73
UK
-0.72
colour
-0.72
flavour
-0.72
util
-0.72
colour
-0.71
authorised
-0.71
POSITIVE LOGITS
toward
3.52
towards
2.68
Towards
1.88
Tow
1.45
oward
1.37
upward
1.06
downward
1.02
favoring
0.99
favorably
0.92
farther
0.89
Activations Density 0.024%