INDEX
Explanations
expressions of clarity and certainty
New Auto-Interp
Negative Logits
rey
-0.16
OTH
-0.15
ighth
-0.15
áln
-0.15
mpi
-0.15
achts
-0.14
ht
-0.14
tok
-0.14
çŃĭ
-0.14
wort
-0.14
POSITIVE LOGITS
obvious
0.15
Gesture
0.14
aram
0.14
è½
0.13
061
0.13
continent
0.13
estro
0.13
meno
0.13
mean
0.13
urgeon
0.13
Activations Density 0.217%