INDEX
Explanations
mentions of parody or advertisements
words or phrases related to patterns of behavior or trends in the context of media and culture
New Auto-Interp
Negative Logits
Brill
-0.66
Valiant
-0.65
Kob
-0.57
stars
-0.56
shapeshifter
-0.56
Fren
-0.55
Hatt
-0.53
fronts
-0.53
hower
-0.51
corner
-0.51
POSITIVE LOGITS
enture
0.87
utical
0.82
icip
0.80
ĪĴ
0.80
opa
0.77
pmwiki
0.77
ormal
0.77
nown
0.76
aded
0.73
illance
0.73
Activations Density 0.083%