INDEX
Explanations
proper nouns or names
names or proper nouns typically associated with individuals
New Auto-Interp
Negative Logits
Carnage
-0.72
Goo
-0.69
Masquerade
-0.69
Sigma
-0.68
Somali
-0.68
istg
-0.67
guiActiveUnfocused
-0.67
[&
-0.67
Miko
-0.66
lesbians
-0.65
POSITIVE LOGITS
entin
0.80
ricks
0.77
anmar
0.77
dinand
0.76
odore
0.74
rick
0.73
rys
0.71
anda
0.71
berman
0.71
ison
0.71
Activations Density 0.290%