INDEX
Explanations
phrases related to expressing opinions or personal experiences
phrases that indicate problem-solving or finding solutions
New Auto-Interp
Negative Logits
broadly
-0.75
decidedly
-0.72
umably
-0.71
understandably
-0.67
ostensibly
-0.65
respectively
-0.64
unsurprisingly
-0.64
Notably
-0.64
surprisingly
-0.64
ensibly
-0.64
POSITIVE LOGITS
lvl
0.75
refund
0.62
****
0.61
downgrade
0.61
ļéĨĴ
0.60
cpu
0.60
WRITE
0.59
correction
0.59
WHY
0.59
cure
0.59
Activations Density 1.792%