INDEX
Explanations
second person phrases indicating perception or understanding
New Auto-Interp
Negative Logits
irsch
-0.17
ray
-0.16
UTION
-0.15
ason
-0.14
è°±
-0.14
ipop
-0.14
ingleton
-0.14
èŃľ
-0.14
езпеÑĩ
-0.13
cem
-0.13
POSITIVE LOGITS
Guess
0.20
guess
0.19
guess
0.19
guessing
0.18
guesses
0.18
probably
0.17
guessed
0.17
visto
0.16
Guess
0.16
hopefully
0.16
Activations Density 0.024%