INDEX
Explanations
negative statements or self-criticism
New Auto-Interp
Negative Logits
simultane
-0.67
mathemat
-0.65
bicy
-0.62
CY
-0.61
ascending
-0.60
blanket
-0.60
retirees
-0.59
networking
-0.58
Scarlet
-0.58
ANGEL
-0.57
POSITIVE LOGITS
t
1.53
tion
1.21
tions
1.16
ti
1.12
tis
1.10
tre
1.07
tar
1.06
nt
1.04
td
1.04
tu
1.02
Activations Density 0.150%