INDEX
Explanations
instances of the word "up."
New Auto-Interp
Negative Logits
t
-0.19
rored
-0.17
unas
-0.15
ro
-0.15
isty
-0.15
ear
-0.14
isis
-0.14
ر
-0.14
ouri
-0.14
place
-0.14
POSITIVE LOGITS
/down
0.22
datable
0.22
ping
0.19
stairs
0.16
turned
0.16
trecht
0.16
dater
0.16
shot
0.16
speed
0.16
sert
0.15
Activations Density 0.094%