INDEX
Explanations
words related to comparisons around the concept of "relative" or "relatively"
instances of the word "relatively" to indicate comparisons
New Auto-Interp
Negative Logits
tein
-0.81
inis
-0.80
Polo
-0.78
arta
-0.74
ieu
-0.73
will
-0.72
rings
-0.71
PT
-0.71
Landing
-0.71
iens
-0.70
POSITIVE LOGITS
unaffected
1.01
unchanged
0.94
innocuous
0.91
insignificant
0.91
insensitive
0.90
tame
0.89
scarce
0.87
benign
0.86
harmless
0.85
unpop
0.84
Activations Density 0.011%