INDEX
Explanations
names of popular culture figures and some associated words
specific pop culture references and terms related to media
New Auto-Interp
Negative Logits
ĪĴ
-1.05
azeera
-0.89
unfocusedRange
-0.86
cffffcc
-0.76
displayText
-0.76
affinity
-0.71
asar
-0.70
PsyNetMessage
-0.69
xual
-0.69
Constructed
-0.65
POSITIVE LOGITS
ν
0.69
Hats
0.67
riks
0.67
daddy
0.63
åŃ
0.63
KNOWN
0.61
zilla
0.59
ÙĴ
0.59
merce
0.57
Picture
0.56
Activations Density 0.220%