INDEX
Explanations
names of individuals
words related to individual or group identities and social handles
New Auto-Interp
Negative Logits
optics
-0.67
s
-0.65
mosp
-0.64
net
-0.62
deck
-0.62
shoulders
-0.62
Attribution
-0.61
ENTION
-0.60
accommodations
-0.60
Immunity
-0.59
POSITIVE LOGITS
ppa
1.44
zzi
1.37
ppo
1.35
zzle
1.34
zza
1.32
pta
1.28
ppe
1.26
ÅŁ
1.24
jo
1.20
lda
1.19
Activations Density 0.230%