INDEX
Explanations
words related to negativity or criticism
the presence of the substring "pl" in various contexts
New Auto-Interp
Negative Logits
âĸ¬
-0.74
HAHA
-0.71
shapeshifter
-0.70
QUI
-0.70
DEM
-0.68
é¾įåĸļ士
-0.67
âĸ¬âĸ¬
-0.66
gerald
-0.65
////////////////////////////////
-0.64
spin
-0.63
POSITIVE LOGITS
asma
1.30
acement
1.18
atinum
1.18
icably
1.12
enty
1.10
astic
1.08
anted
1.02
atter
1.01
atoon
0.99
ague
0.98
Activations Density 0.009%