INDEX
Explanations
phrases indicating range or variation in quantities
New Auto-Interp
Negative Logits
resy
-0.78
mone
-0.73
lees
-0.72
POST
-0.72
orsi
-0.70
icipated
-0.70
rition
-0.69
iquette
-0.67
nor
-0.65
itement
-0.64
POSITIVE LOGITS
mildly
0.96
mild
0.75
afar
0.73
innocuous
0.73
quirky
0.71
thinly
0.70
humorous
0.70
benign
0.70
mundane
0.69
outright
0.67
Activations Density 0.028%