INDEX
Explanations
specific names or proper nouns related to individuals in professional contexts
New Auto-Interp
Negative Logits
WTF
-0.18
(
-0.18
dude
-0.18
freaking
-0.17
FUCK
-0.16
('-0.15
(“
-0.15
OK
-0.15
ok
-0.15
badass
-0.15
POSITIVE LOGITS
folk
0.18
folk
0.17
gossip
0.17
clustering
0.17
clustered
0.15
talk
0.15
wag
0.15
Clan
0.15
clans
0.15
--↵
0.14
Activations Density 0.003%