INDEX
Explanations
tokens or elements that signal significant distinctions or categorizations in various contexts
New Auto-Interp
Negative Logits
eton
-0.17
antee
-0.17
reeze
-0.16
Skill
-0.15
.mixin
-0.15
pras
-0.15
ifter
-0.15
fea
-0.15
skill
-0.15
Hobby
-0.14
POSITIVE LOGITS
CTOR
0.17
undry
0.16
zano
0.15
ONENT
0.15
má
0.15
arbeit
0.15
Laurel
0.15
Oliveira
0.14
chooser
0.14
sdk
0.14
Activations Density 0.005%