INDEX
Explanations
expressions or interjections conveying surprise, realization, or commentary
phrases expressing disbelief or surprise
New Auto-Interp
Negative Logits
BILITIES
-0.88
ãĥł
-0.68
MRI
-0.67
abdom
-0.67
BIL
-0.65
İĭ
-0.65
udo
-0.64
hypothal
-0.63
ially
-0.62
thood
-0.61
POSITIVE LOGITS
Witnesses
0.71
Pradesh
0.68
va
0.66
schild
0.64
ibaba
0.63
wanna
0.63
Mistress
0.62
Dah
0.61
lda
0.61
Haku
0.59
Activations Density 0.286%