INDEX
Explanations
positive adjectives
positive descriptors and evaluations of quality
New Auto-Interp
Negative Logits
meric
-0.69
HQ
-0.66
mere
-0.65
pora
-0.65
berto
-0.61
minecraft
-0.61
atars
-0.60
uala
-0.60
Peb
-0.58
ubi
-0.58
POSITIVE LOGITS
indeed
0.78
nails
0.76
sounding
0.74
nered
0.68
understatement
0.68
wound
0.67
nowadays
0.61
spoiler
0.61
(>
0.61
sailing
0.61
Activations Density 0.115%