INDEX
Explanations
instances of uncertainty or ambiguity related to belief statements or identities
New Auto-Interp
Negative Logits
erdem
-0.16
quip
-0.14
าะ
-0.14
aeper
-0.14
yorum
-0.13
preocup
-0.13
поба
-0.13
ona
-0.13
à¸Ńย
-0.12
[section
-0.12
POSITIVE LOGITS
statement
0.38
declaration
0.32
statements
0.30
statement
0.29
declare
0.29
declar
0.28
Statement
0.28
声æĺİ
0.27
Declaration
0.27
pron
0.27
Activations Density 0.101%