INDEX
Explanations
names of people or organizations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
tumblr
-0.65
clubhouse
-0.59
Presidents
-0.53
=]
-0.51
chicks
-0.51
Disneyland
-0.51
undermin
-0.51
ModLoader
-0.51
womb
-0.50
Rebels
-0.50
POSITIVE LOGITS
jit
0.78
jen
0.73
berto
0.72
án
0.71
Doyle
0.70
ÅĤ
0.69
sey
0.69
Hos
0.69
Ay
0.68
Nguyen
0.67
Activations Density 0.657%