INDEX
Explanations
interrogative phrases, particularly questions
New Auto-Interp
Negative Logits
long
-0.67
</i>
-0.64
aure
-0.63
navbar
-0.62
oire
-0.62
Alu
-0.62
AL
-0.61
Stoner
-0.60
aus
-0.60
lug
-0.58
POSITIVE LOGITS
%?
2.03
?!?
1.83
’?
1.73
?"
1.70
?}
1.70
!?
1.68
}?
1.67
?”
1.64
$?
1.62
?
1.59
Activations Density 0.189%