INDEX
Explanations
action-related verbs and phrases indicating movement or escape
New Auto-Interp
Negative Logits
byter
-0.66
hig
-0.64
etus
-0.63
amaru
-0.62
ongyang
-0.62
milo
-0.62
oun
-0.60
trump
-0.60
arate
-0.60
untu
-0.59
POSITIVE LOGITS
leash
0.78
bandwagon
0.76
iveness
0.70
Reloaded
0.69
ousel
0.69
lookout
0.67
charger
0.66
steps
0.66
doorstep
0.66
island
0.66
Activations Density 0.038%