INDEX
Explanations
references to collective identity or community ownership
New Auto-Interp
Negative Logits
orous
-0.16
ood
-0.16
urs
-0.14
offs
-0.14
ilation
-0.14
оÑĩ
-0.14
us
-0.14
625
-0.14
.us
-0.14
ernal
-0.13
POSITIVE LOGITS
own
0.19
chs
0.17
eah
0.15
SEL
0.15
bservice
0.15
bsp
0.15
behalf
0.15
self
0.15
brtc
0.14
icker
0.14
Activations Density 0.154%