INDEX
Explanations
references to inclusivity and collective recognition of people
New Auto-Interp
Negative Logits
iscri
-0.50
y
-0.50
Utrecht
-0.43
pregi
-0.42
立
-0.41
kyard
-0.41
anmelden
-0.41
chain
-0.39
llary
-0.38
大な
-0.38
POSITIVE LOGITS
everyone
1.04
everyone
0.93
EVERYONE
0.91
everybody
0.88
Everyone
0.87
perſon
0.86
anyone
0.85
Everyone
0.85
everybody
0.84
someone
0.79
Activations Density 0.089%