INDEX
Explanations
the word "whatever" in various contexts
New Auto-Interp
Negative Logits
hiba
-0.15
scape
-0.15
jamin
-0.15
yers
-0.14
dl
-0.14
hor
-0.14
unate
-0.14
еÑģÑĤи
-0.13
enis
-0.13
urb
-0.13
POSITIVE LOGITS
else
0.17
kinds
0.16
kind
0.16
sort
0.15
.truth
0.14
th
0.14
lapping
0.14
season
0.14
elder
0.14
dock
0.13
Activations Density 0.018%