INDEX
Explanations
duplicated letters in words
words associated with humor or silliness
New Auto-Interp
Negative Logits
代
-0.85
misunder
-0.76
Luthor
-0.71
DonaldTrump
-0.69
imir
-0.67
ewski
-0.67
nikov
-0.66
itates
-0.65
Integrity
-0.64
auri
-0.62
POSITIVE LOGITS
gey
1.24
zing
1.20
ze
1.17
gee
1.15
zer
1.12
zy
1.09
zers
1.06
zeb
1.06
zie
1.04
leans
1.02
Activations Density 0.045%