INDEX
Explanations
the word "Croquettes."
New Auto-Interp
Negative Logits
uate
-0.74
ãĥ¼ãĥĨãĤ£
-0.73
imental
-0.72
ï¸
-0.68
Werewolf
-0.68
LESS
-0.67
uated
-0.65
iating
-0.65
UAL
-0.64
assistants
-0.64
POSITIVE LOGITS
oks
1.20
oked
1.16
chet
1.04
ppo
1.03
opa
1.00
pper
0.99
pped
0.97
tch
0.95
ppers
0.94
bones
0.94
Activations Density 0.048%