INDEX
Explanations
interrogative sentences or questions
New Auto-Interp
Negative Logits
aus
-0.71
𝙫
-0.71
böz
-0.70
oire
-0.69
𝙜
-0.69
navbar
-0.68
Bradley
-0.67
ade
-0.65
a
-0.65
Alu
-0.65
POSITIVE LOGITS
%?
1.88
?
1.71
؟
1.64
?!?
1.62
?}
1.55
’?
1.54
$?
1.53
?"
1.49
!?
1.48
?
1.45
Activations Density 0.133%