INDEX
Explanations
adjectives that describe the intensity or magnitude of actions or qualities
words indicating significant assessments or evaluations
New Auto-Interp
Negative Logits
lio
-0.81
deen
-0.75
lass
-0.75
raf
-0.75
atech
-0.71
here
-0.70
pherd
-0.69
estate
-0.67
hedon
-0.66
gars
-0.66
POSITIVE LOGITS
exceptions
1.42
caveats
1.34
exception
1.27
impunity
1.24
hindsight
1.22
ease
1.08
caveat
1.07
emphasis
1.06
flair
1.06
vig
1.04
Activations Density 0.237%