INDEX
Explanations
words related to controversial or provocative topics or figures
references to individuals or entities related to governmental or political contexts
New Auto-Interp
Negative Logits
Lauder
-0.64
eer
-0.60
DEM
-0.60
bilt
-0.58
enium
-0.57
Lear
-0.57
Idlib
-0.57
mble
-0.57
Magikarp
-0.56
rout
-0.56
POSITIVE LOGITS
rief
1.32
odies
1.18
ruary
1.17
rities
1.16
attery
1.14
esides
1.08
ishops
1.07
amboo
1.07
ilingual
1.06
ibli
1.06
Activations Density 0.050%