INDEX
Explanations
phrases related to direction or orientation
phrases related to direction or guidance
New Auto-Interp
Negative Logits
tein
-0.69
oak
-0.66
zb
-0.61
Priv
-0.60
aca
-0.60
Victims
-0.59
bley
-0.59
LU
-0.59
McKay
-0.59
aqu
-0.59
POSITIVE LOGITS
direction
1.26
directions
1.04
ality
0.96
direction
0.84
finding
0.83
arity
0.83
ward
0.81
naire
0.80
forth
0.77
Direction
0.77
Activations Density 0.017%