INDEX
Explanations
negations and contradictions
negative assertions about various subjects
New Auto-Interp
Negative Logits
pione
-0.80
ãĥİ
-0.75
uador
-0.69
ù
-0.68
đ
-0.68
-0.68
ā
-0.68
Ĉ
-0.68
Ă
-0.68
ü
-0.68
POSITIVE LOGITS
.
1.65
.]
1.47
!.
1.46
.</
1.45
.[
1.44
.","
1.40
!
1.39
.(
1.39
.)
1.32
.'
1.31
Activations Density 1.356%