INDEX
Explanations
mentions of things that are slightly different or slightly more/less than something else
the word "slightly" and its variations
New Auto-Interp
Negative Logits
elsen
-0.75
STATS
-0.74
ravings
-0.69
Aviv
-0.68
vre
-0.66
itivity
-0.65
rights
-0.65
ilee
-0.64
inarily
-0.64
ulkan
-0.64
POSITIVE LOGITS
altered
0.93
misleading
0.86
eccentric
0.85
tilted
0.83
differently
0.83
tweaked
0.82
modified
0.81
inconven
0.81
inflated
0.80
biased
0.80
Activations Density 0.035%