INDEX
Explanations
variations of the word "up."
New Auto-Interp
Negative Logits
place
-0.18
hood
-0.15
س
-0.15
coni
-0.15
esters
-0.15
berra
-0.15
iced
-0.14
zÅij
-0.14
.uk
-0.14
cury
-0.14
POSITIVE LOGITS
/down
0.25
datable
0.21
sk
0.18
ture
0.17
shot
0.16
ãĥĮ
0.15
grub
0.15
turned
0.15
ended
0.15
ren
0.14
Activations Density 0.061%