INDEX
Explanations
phrases related to cosmetic or superficial attributes and their implications
New Auto-Interp
Negative Logits
Kramer
-0.15
clearing
-0.15
plusplus
-0.15
des
-0.14
ollo
-0.14
marches
-0.14
ulla
-0.14
oldem
-0.14
elves
-0.14
occo
-0.14
POSITIVE LOGITS
ylon
0.17
ìĤ¬íķŃ
0.16
ìĤ¬íķŃ
0.15
inkle
0.15
decorate
0.14
oma
0.14
affer
0.14
rud
0.13
krom
0.13
MLE
0.13
Activations Density 0.002%