INDEX
Explanations
instances of question words or phrases, particularly those that convey inquiry or uncertainty
New Auto-Interp
Negative Logits
es
-0.16
alian
-0.15
zl
-0.15
umn
-0.14
atching
-0.14
vik
-0.14
mission
-0.14
v
-0.14
b
-0.14
sind
-0.14
POSITIVE LOGITS
'il
0.26
'Ãł
0.26
'en
0.21
itter
0.21
'un
0.21
'une
0.21
icon
0.19
’il
0.19
'elle
0.18
'ils
0.18
Activations Density 0.006%