INDEX
Explanations
words related to directions or movement, specifically words such as "up" and "out."
directional words or phrases indicating movement or positions
New Auto-Interp
Negative Logits
rouse
-0.67
dstg
-0.65
Turing
-0.63
TEXTURE
-0.63
constitu
-0.61
hyde
-0.61
assic
-0.59
EStream
-0.58
tein
-0.58
stim
-0.58
POSITIVE LOGITS
ward
1.02
stairs
0.99
coming
0.97
neath
0.90
ices
0.87
stream
0.85
numbered
0.85
look
0.84
raged
0.83
come
0.82
Activations Density 0.092%