INDEX
Explanations
phrases indicating movement or direction
New Auto-Interp
Negative Logits
quate
-0.15
anko
-0.15
downward
-0.14
irma
-0.14
ltk
-0.14
eniable
-0.14
/problem
-0.14
bs
-0.14
zbek
-0.14
Forces
-0.14
POSITIVE LOGITS
towards
0.28
toward
0.28
river
0.25
wind
0.24
through
0.23
into
0.23
past
0.21
onto
0.20
range
0.19
state
0.18
Activations Density 0.055%