INDEX
Explanations
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
íķĺìĦ¸ìļĶ
-0.15
ÑĮÑĤе
-0.15
azzi
-0.15
youre
-0.15
ä½ł
-0.14
anted
-0.14
/***/
-0.14
yny
-0.14
belief
-0.14
claimer
-0.14
POSITIVE LOGITS
Ah
0.20
FML
0.20
Ah
0.20
oh
0.19
wait
0.18
Hmm
0.17
wait
0.17
ah
0.17
Hmm
0.16
Wait
0.16
Activations Density 0.200%