INDEX
Explanations
strong positive adjectives indicative of high quality or performance
New Auto-Interp
Negative Logits
nette
-0.16
oko
-0.16
duk
-0.16
ĥĿ
-0.15
ÏĢει
-0.15
oleon
-0.15
iaux
-0.15
oom
-0.15
akin
-0.14
apper
-0.14
POSITIVE LOGITS
-quality
0.25
owl
0.19
ly
0.17
avery
0.17
erate
0.16
ordinary
0.15
itude
0.15
ior
0.15
craftsmanship
0.15
-value
0.15
Activations Density 0.026%