INDEX
Explanations
expressions of confusion or questioning regarding understanding
Expressing confusion or lack of understanding
understanding why
New Auto-Interp
Negative Logits
ja
-0.45
DockStyle
-0.44
provisoire
-0.42
setopt
-0.42
aka
-0.41
εμπ
-0.41
võ
-0.40
terce
-0.40
bäst
-0.39
masas
-0.39
POSITIVE LOGITS
inexplicable
1.14
illogical
1.09
puzzling
1.07
why
1.06
baffling
1.05
puzzled
1.02
perplexing
1.01
WTF
0.98
wtf
0.95
unexplained
0.95
Activations Density 0.373%