INDEX
Explanations
negation expressions or indicators of falsehood in logical statements
negation operator !
New Auto-Interp
Negative Logits
שוליים
-0.68
zwiſchen
-0.68
queſta
-0.67
メンテナ
-0.67
Personendaten
-0.66
<unused28>
-0.65
<unused47>
-0.65
<unused23>
-0.65
[@BOS@]
-0.65
<unused3>
-0.65
POSITIVE LOGITS
=!
0.80
!
0.77
(!
0.71
(!
0.68
!
0.65
((!
0.60
{!0.58
!
0.53
{!0.52
[!
0.51
Activations Density 0.007%