INDEX
Explanations
the word "plus" along with a numerical value, potentially indicating a positive association or addition
phrases indicating the addition or accumulation of quantities
New Auto-Interp
Negative Logits
Cry
-0.75
ãĤ¶
-0.65
urg
-0.62
ami
-0.61
hap
-0.61
ammers
-0.60
robe
-0.59
anes
-0.59
terness
-0.58
DEBUG
-0.57
POSITIVE LOGITS
plus
3.76
minus
2.51
plus
2.30
PLUS
2.25
Plus
2.03
minus
1.97
Plus
1.75
+
1.36
+
1.29
combined
1.23
Activations Density 0.014%