INDEX
Explanations
expressions related to assertion and belief
New Auto-Interp
Negative Logits
بÙĪÙĦ
-0.19
ossip
-0.17
utow
-0.16
ostel
-0.15
kvin
-0.15
VISIBLE
-0.15
tryside
-0.15
Erotische
-0.15
ayıp
-0.15
AGMA
-0.14
POSITIVE LOGITS
told
0.48
given
0.42
given
0.35
Given
0.32
asked
0.32
Given
0.31
offered
0.29
informed
0.28
Asked
0.27
presented
0.27
Activations Density 0.192%