INDEX
Explanations
phrases indicating position or roles within an organization or community context
New Auto-Interp
Negative Logits
167
-0.16
olik
-0.15
fi
-0.15
illon
-0.14
loos
-0.14
lin
-0.14
mini
-0.13
emed
-0.13
267
-0.13
rl
-0.13
POSITIVE LOGITS
irst
0.16
DonaldTrump
0.15
tain
0.15
assi
0.15
iciary
0.14
eson
0.14
èĤĸ
0.14
IRST
0.14
TOR
0.14
оÑĢони
0.14
Activations Density 0.308%