INDEX
Explanations
phrases emphasizing importance or specificity
descriptive words indicating scarcity or minimalism
New Auto-Interp
Negative Logits
hower
-0.73
atcher
-0.71
pherd
-0.68
acht
-0.68
ATURE
-0.64
nesday
-0.64
SHIP
-0.64
Reviewer
-0.62
merce
-0.61
Cullen
-0.60
POSITIVE LOGITS
regard
0.88
intention
0.83
impunity
0.81
expectation
0.77
hindsight
0.74
bang
0.71
abandon
0.71
intentions
0.71
standing
0.71
intent
0.71
Activations Density 0.297%