INDEX
Explanations
phrases expressing trust or belief in the speaker's words
New Auto-Interp
Negative Logits
oce
-0.17
reads
-0.15
£
-0.15
utz
-0.14
kat
-0.14
'Ñı
-0.14
byname
-0.13
Nat
-0.13
subtotal
-0.13
whats
-0.13
POSITIVE LOGITS
ONGL
0.16
>NN
0.15
ulet
0.15
enaire
0.15
oose
0.15
lia
0.14
ior
0.14
ergus
0.14
engkap
0.14
Ïĩι
0.14
Activations Density 0.041%