INDEX
Explanations
superlatives or nouns that indicate a unique or exceptional position
phrases indicating exclusivity or singularity
New Auto-Interp
Negative Logits
des
-0.74
ends
-0.71
rawl
-0.67
md
-0.66
iety
-0.66
ence
-0.65
ruary
-0.64
abuse
-0.63
cart
-0.63
de
-0.63
POSITIVE LOGITS
thing
0.97
surviving
0.90
remaining
0.89
conceivable
0.88
drawback
0.84
reason
0.82
obstacle
0.81
extant
0.81
beneficiary
0.79
way
0.77
Activations Density 0.030%