INDEX
Explanations
directional words and phrases indicating movement or position
New Auto-Interp
Negative Logits
rams
-0.67
ivities
-0.67
ories
-0.63
usc
-0.62
atively
-0.61
xious
-0.60
sels
-0.60
uren
-0.58
iny
-0.58
matically
-0.57
POSITIVE LOGITS
cliffe
0.70
ruary
0.67
stage
0.62
Vulcan
0.62
ategory
0.60
stairs
0.60
othal
0.60
hovah
0.60
flix
0.59
WARD
0.59
Activations Density 0.011%