INDEX
Explanations
comparative adjectives
instances of the word "relatively."
New Auto-Interp
Negative Logits
inis
-0.77
will
-0.75
ses
-0.72
Landing
-0.71
abad
-0.69
core
-0.69
PT
-0.68
tein
-0.68
Polo
-0.67
arta
-0.67
POSITIVE LOGITS
unaffected
0.91
innocuous
0.88
unchanged
0.88
scarce
0.88
insignificant
0.87
insensitive
0.85
inexpensive
0.83
harmless
0.83
unpop
0.81
tame
0.81
Activations Density 0.009%