INDEX
Explanations
instances of the word "don’t" and its variants, indicating a focus on negation or restrictions
New Auto-Interp
Negative Logits
ãĤ®
-0.14
ures
-0.14
resa
-0.13
LES
-0.13
ll
-0.13
ta
-0.13
ieur
-0.13
les
-0.13
="__
-0.13
æŃ²
-0.13
POSITIVE LOGITS
't
0.24
'T
0.19
`t
0.19
+t
0.17
’t
0.17
ot
0.17
nost
0.17
;t
0.16
et
0.16
´t
0.16
Activations Density 0.091%