INDEX
Explanations
words related to following instructions or actions
instances of the word "follow"
New Auto-Interp
Negative Logits
pite
-0.73
orc
-0.69
ukemia
-0.67
wounding
-0.67
aucas
-0.67
inese
-0.66
ldom
-0.65
intendent
-0.64
rimination
-0.64
Extreme
-0.63
POSITIVE LOGITS
follow
0.91
follow
0.85
Follow
0.83
follows
0.82
LLOW
0.78
ĸļ
0.76
suit
0.72
faithfully
0.71
SHIP
0.70
ansen
0.68
Activations Density 0.025%