INDEX
Explanations
phrases that involve guessing or questioning
New Auto-Interp
Negative Logits
outu
-0.17
allas
-0.16
anou
-0.15
emens
-0.15
untu
-0.15
afil
-0.15
alom
-0.14
ide
-0.14
ucz
-0.14
lsi
-0.14
POSITIVE LOGITS
Guess
0.21
guesses
0.19
work
0.18
guessing
0.17
guessed
0.17
bones
0.17
sız
0.16
guess
0.16
correctly
0.15
(guess
0.15
Activations Density 0.021%