INDEX
Explanations
words indicating a high degree of significance or importance
the word "particularly" and related expressions emphasizing significance or importance
New Auto-Interp
Negative Logits
ylon
-0.71
ences
-0.68
CT
-0.65
docs
-0.65
cli
-0.63
ORTS
-0.63
glers
-0.63
only
-0.61
cius
-0.61
TRY
-0.61
POSITIVE LOGITS
egregious
1.02
suited
1.01
noteworthy
0.95
susceptible
0.93
noticeable
0.88
vulnerable
0.86
acute
0.86
advantageous
0.85
pronounced
0.82
fond
0.81
Activations Density 0.039%