INDEX
Explanations
phrases indicating special emphasis or importance
emphasis on specific topics or attributes
New Auto-Interp
Negative Logits
ylon
-0.68
ences
-0.66
essentially
-0.65
CT
-0.64
ensibly
-0.63
ruary
-0.63
substitutes
-0.63
only
-0.62
Anarchy
-0.62
offic
-0.61
POSITIVE LOGITS
egregious
1.09
noteworthy
1.04
suited
0.93
notable
0.90
troublesome
0.90
noticeable
0.88
susceptible
0.85
poignant
0.84
advantageous
0.82
vulnerable
0.82
Activations Density 0.050%